For the greater part of 30 years, ILI technologies have been collecting data on pipelines from all over the world. Now, the accumulated data lake has matured to a point where it can begin to power modern artificial intelligence (AI) solutions for inspection, integrity and risk analysis. In this new series of articles, Michael Smith and Roland Palmer-Jones from our Integrity Solutions business line explore different aspects of the Integrity Data Warehouse (IDW) – starting with condition prediction for uninspected pipelines.


This article will look at some of the exciting new applications generated by ROSEN’s Integrity Analytics initiative – a project designed to investigate how AI techniques can support pipeline operators with their integrity management decisions. The initiative offers a meaningful insight into what can be achieved when vast quantities of information are collated.

But where is the information exactly?

As part of the Integrity Analytics project, ROSEN is developing a large repository of historical ILI results (feature listings) and corresponding pipeline information, called the Integrity Data Warehouse (IDW).
To date, the IDW contains detailed information for almost 10,000 pipelines from around the world (Figure 1). The IDW is growing rapidly and will soon include information from the majority of inspections since 2000, as well as information from all newly completed inspections.

Figure 1 – Contents of the Integrity Data Warehouse (IDW)

In this article, we will take a look at the first major application enabled by the IDW: condition prediction for uninspected pipelines.


Although the core business of the ROSEN Group is pipeline inspection, we know that approximately 40% of the world’s pipelines cannot be inspected using ILI (even when we deploy novel technologies).

We also know that there is a wealth of valuable information available for many of the 60% that can be inspected.

This leads us naturally towards supervised machine learning as a monitoring solution for those pipelines that cannot be inspected. We can observe trends in inspected pipelines and apply what we have learned to uninspected pipelines (Figure 2).

Figure 3 – Pipeline threats

Figure 2 – Supervised machine learning for condition prediction

This technique can be applied to a number of different pipeline threats (Figure 3), but initial work has focused heavily on the endemic and ever-present threat of external corrosion.

Figure 3 – Pipeline threats

Figure 3 – Pipeline threats


As a first attempt at condition prediction, we tested the feasibility of predicting the number of external corrosion anomalies in pipelines, using basic design and construction information alone – such as age, coating type and geographical location. More specifically, the target variable was anomaly density – the number of anomalies per unit length. This is a primitive but useful indication of the overall integrity status.

Looking at the distribution of anomaly density (anomalies per kilometer) for a random sample of pipelines from the IDW, we can see that the values vary over several orders of magnitude. At the lower end of the distribution, we find pristine pipelines with less than one anomaly every 100 kilometers, while at the upper end, there are pipelines with more than 10,000 anomalies per kilometer.

Figure 4 – Anomaly densities for IDW pipelines (log10 scale)

Figure 4 – Anomaly densities for IDW pipelines (log10 scale)


To make things easier for the machine learning algorithm, we therefore bin the anomaly densities into four classes, as shown in Figure 5.

By feeding the design and construction variables and observed anomaly density classes into a supervised machine learning algorithm, we can create a predictive model. The model can then be tested on an unseen sample of pipelines to measure its performance.

Figure 5 – Anomaly density classification

The first prototype model (a “boosted decision tree”) was trained on ~4,000 pipelines from the IDW and tested on ~1,000 unseen pipelines.
Although these ~1,000 pipelines had also been inspected in the past, their ILI results were not used to train the model. The predictions for these pipelines were generated based on their design and construction variables alone (as would be the case for truly uninspected pipelines), and their ILI information was used exclusively for validation.

Even though external corrosion is an exceptionally complex phenomenon, the performance of the preliminary model was surprisingly good (Figure 6).

Figure 6 – Confusion matrix – prediction results for test dataset

Figure 6 – Confusion matrix – prediction results for test dataset

58% of the test pipelines were classified perfectly, and 92% of the pipelines were predicted within one class of the true value (roughly speaking, within one order of magnitude of the true anomaly density).
While order-of-magnitude predictions sound modest, the model outperforms traditional risk assessment solutions by a considerable margin.

Nevertheless, those paying attention will spot the outliers in the bottom left and top right corners of the confusion matrix. To correct those inaccurate predictions, the next step is to incorporate new predictor variables, including local soil properties (pH, resistivity, etc.), land use, climate, terrain and even socioeconomic status. With higher-granularity data, it will also be possible to predict anomaly densities (or other aspects of external corrosion condition) for smaller segments, or even pipe joints (Figure 7).

Figure 7 – High-resolution external corrosion prediction

Figure 7 – High-resolution external corrosion prediction


This approach is now being developed further with the support of a major pipeline operator, and early results are extremely promising.
As the accuracy of the predictions improves, they will be used to prioritize rehabilitation activities and, where appropriate, justify the costs of the modifications needed to allow internal inspection. There are many other potential applications, such as predicting the future condition of inspected lines, so look out for more examples in future articles. In addition, if you have any ideas for applications of the IDW, then do not hesitate to get in touch with the authors.

De Leon, C and Smith, M S (2020). Machine Learning to Support Risk and Integrity Management. Pipeline Pigging & Integrity Management (PPIM) Conference, February 2020, Houston, Texas, United States of America.