For the greater part of 30 years, ILI technologies have been collecting data on pipelines from all over the world. Now, the accumulated data lake has matured to a point where it can begin to power modern artificial intelligence (AI) solutions for inspection, integrity and risk analysis. In the second part of this series, ROSEN experts Daniel Sandana, Michael Smith, Rodolphe Jamo and Roland Palmer-Jones from our Integrity Solutions line of business explore different aspects of the Integrity Data Warehouse (IDW) – continuing with machine learning for stress corrosion cracking (SCC) management.


In Part I of this series, we introduced ROSEN’s Integrity Data Warehouse (IDW), a data repository containing detailed integrity management information for almost 10,000 pipelines from around the world (Figure 1). The IDW will soon include information from the majority of inspections since 2000, as well as information from all newly completed inspections

Figure 1 – Contents of the Integrity Data Warehouse (IDW)

In the first article, we described how supervised machine learning techniques could be used to predict the condition of uninspected pipelines. The concept was exemplified by looking at the specific case of external corrosion prediction.

The subject of Part 2 is a predictive-analytics project conducted for a major pipeline operator in 2020. In this project, we explored how similar techniques could be applied to the even more complex phenomenon of stress-corrosion cracking (SCC).


As many of us are aware, the identification of SCC is challenging with crack inspection systems alone, while alternatives such as direct assessment and susceptibility based methodologies [1] come with even greater uncertainties [2]. As discussed in ROSEN’s 2020 white paper on the topic [3], the industry is still striving to understand the various facets within the materials-environment-stress system (Figure 2) that influence SCC. For example, we have yet to establish conclusive links between microscale structure and chemistry, macroscopic behavior of an alloy, and their combined influence on the occurrence of SCC.

Figure 2 – Drivers for SCC

Figure 2 – Drivers for SCC

Crack management remains a complex technical problem, partly because there are still fundamental gaps in our knowledge. It is information intensive, relies heavily on the judgement of experts, and the occurrence and growth behavior of environmental cracks is difficult to model. Under these circumstances, it is worth posing the question as to whether other approaches could be brought to bear on the problem. Machine learning is one such approach.


The target pipeline for the study was a crude oil pipeline with a known threat of external SCC, as identified during historical in-line inspections (ILI) and field verifications. However, due to limitations in crack detection performance – regardless of the ILI system deployed – the operator was concerned about the presence of undetected SCC.

To evaluate the merits of data analytics techniques for crack management, the operator commissioned ROSEN to conduct a six-month pilot study to predict the likely locations of cracking on the pipeline. The output of the study was a predictive model that estimated probability of presence of cracking per pipe joint based on characteristics of the pipeline and the local environment. The goal was to find cracking that may have escaped detection by the ILI systems deployed, perhaps because of the nature, morphology and size of the cracking, or because of its location within the pipeline. A model was trained (via supervised machine learning) using a combination of data specific to the target pipeline (information on design/construction, environment, operations, cathodic protection [CP], ILI and field verification), in addition to information from ROSEN’s IDW. The final dataset comprised over 50 variables, all relevant to the physics of SCC and describing more than 25,000 of the pipeline’s pipe joints.

An example of the model outputs (predicted probability of SCC presence) for one of the pipeline sections is visualized in Figure 3. Each point represents a unique pipe joint and is colored according to the corresponding probability. Higher probability points are also plotted larger for clarity.

Figure 3: Probability of crack presence for the target pipeline

Figure 3 – Probability of crack presence for the target pipeline

A detailed probability profile such as this one raises a very natural question: Will we find cracks if we excavate one of the higher-probability (yellow) areas? Although cross-validation indicates promising model performance, such questions can only be answered with future field verification data.

Nevertheless, what merits some further discussion at this stage is the capacity of such techniques to create models that “respect” the physical nature of complex processes and mechanisms such as SCC. This point is corroborated by Figure 4, which shows the contribution from the top 10 most influential variables in the SCC model. It is important to note that these top 10 variables were identified by the machine learning model without being pre-identified or pre-weighted in any way by expert opinion. The data alone tells us that these variables are important.

Intuitively, the listed variables are in line with the aforementioned materials-environment-stress framework necessary for SCC, giving some real credibility to the model.

Figure 4: Top 10 most influential variables for SCC prediction on the target pipeline

Figure 4 – Top 10 most influential variables for SCC prediction on the target pipeline

One variable, in particular, deserves some further attention: the IDW susceptibility score. What was this variable, and what exactly was the role of the IDW in the prediction of SCC on the target pipeline?

At first glance, the aspect of materials seems to be under represented in Figure 4, with no explicit materials related parameters appearing within the top 10 variables. In practice, this is actually a common observation in SCC prediction models, since the most relevant parameters (microstructures, chemistries, etc.) are rarely accessible over the full length of a pipeline. The default strategy is to use proxy variables such as pipe grade, manufacturer and heat numbers, but due to lack of granularity and reliability (particularly in older systems), these variables tend not to convey as much information. In the present case, a large proportion of the target pipeline was constructed with API 5L Grade X60 (with some smaller sections of X52, X42, TSE 360, etc.), with predominantly submerged arc welded (SAW) seams and a smaller number of electrical, resistance-welded (ERW) seams. Unless cracking is specifically identified in minority populations (which is not the case here), variables such as pipe grade and weld type are unlikely to have significant predictive power.

This problem is addressed to some extent with the IDW susceptibility score. The score was derived from observations of several hundred crack inspections represented in the IDW. The idea is to learn from trends in the relationship between design and construction characteristics (such as pipe grade, weld type and coating type) and the condition of the pipelines. These trends can then be formalized in a predictive model and encapsulated within the IDW susceptibility score.

Since the trends are learned from a much larger and more diverse sample of pipelines, the score turns out to be a reliable predictor for the presence (or otherwise) of SCC (hence its position as the 5th most important variable in the model). It seems credible that the IDW susceptibility score provides a partial proxy for materials aspects that cannot be considered directly.


The data available in the IDW will continue to grow – and with it the power to predict pipeline condition and, even better, to identify effective mitigation activities. The strength lies in both the depth and the breadth of the database as well as in the quality of the information.

The depth of the database can be rapidly increased by adding data from past and future in line inspections for metal loss, cracking, deformation, material properties and so on. The breadth will increase as we consider different inspection technologies, including newer technologies such as the pipe grade sensor and axial stress measurement. The breadth will also increase as we add the results of field verification and laboratory testing. These additional measurements (while much more limited in extent than ILI) provide rich additional layers that can help our understanding and modelling of the many influential factors.

Just as the relatively simple but extensive crack data from the IDW helped to predict the location of cracking in this specific case, the addition of extensive information on material strength, axial stress and coating condition, plus the myriad of information available from in-the-ditch verification on soil types, steel microstructure and crack morphology, will enhance our ability to support and guide the industry in managing even the most complex of threats.

What is also clear even at this relatively early stage is that the quality of the information also has a major impact on the power of the IDW. Good quality data relies first and foremost on the initial collection and evaluation of the data, and great care must be taken thereafter in extracting the information from listings, field reports and laboratory investigations to ensure consistency and relevance.

The scale of this data engineering exercise must not be underestimated, but as we can see from the examples of predicting corrosion in uninspected pipelines or predicting the presence of SCC in a specific pipeline, the value for the industry can be immense.


1 SCC Direct Assessment Methodology, NACE SP0204, 2008
2 High-pH and Near-Neutral pH SCC of pipelines: challenges in susceptibility modelling, Pipeline International Journal, Fall 2018, D. Sandana, 2018
3 Navigating the minefield of external SCC management, Daniel Sandana, Michael Smith, Roland Palmer-Jones, NACE Materials Performance, September 2020