Author: Roland Palmer-Jones

Condition Prediction for Uninspected Pipelines – IDW Part 1

Applications of the Integrity Data Warehouse

In a Nutshell:

For the greater part of 30 years, ILI technologies have been collecting data on pipelines from all over the world. Now, the accumulated data lake has matured to a point where it can begin to power modern artificial intelligence (AI) solutions for inspection, integrity and risk analysis. In this new series of articles, Roland Palmer-Jones explores different aspects of the Integrity Data Warehouse (IDW) – starting with condition prediction for uninspected pipelines.

The Integrity Data Warehouse

This article will look at some of the exciting new applications generated by ROSEN’s Integrity Analytics initiative – a project designed to investigate how AI techniques can support pipeline operators with their integrity management decisions. The initiative offers a meaningful insight into what can be achieved when vast quantities of information are collated.

But where is the information exactly?

As part of the Integrity Analytics project, ROSEN is developing a large repository of historical ILI results (feature listings) and corresponding pipeline information, called the Integrity Data Warehouse (IDW). To date, the IDW contains detailed information for almost 10,000 pipelines from around the world (Figure 1). The IDW is growing rapidly and will soon include information from the majority of inspections since 2000, as well as information from all newly completed inspections.

Figure showing a collection of data in the form of cabinets and drawers on the left-hand side and the information contained in the Integrity Data Warehouse (inline inspection data, design and construction data, environmental data, operational data and surveys) on the right-hand side.Figure 1 – Contents of the Integrity Data Warehouse (IDW) 

In this article, we will take a look at the first major application enabled by the IDW: condition prediction for uninspected pipelines.

Condition prediction for uninspected pipelines

Although the core business of the ROSEN Group is pipeline inspection, we know that approximately 40% of the world’s pipelines cannot be inspected using ILI (even when we deploy novel technologies).

We also know that there is a wealth of valuable information available for many of the 60% that can be inspected.

This leads us naturally towards supervised machine learning as a monitoring solution for those pipelines that cannot be inspected. We can observe trends in inspected pipelines and apply what we have learned to uninspected pipelines (Figure 2).

Graphical comparison of several inspected pipelines and a single uninspected pipeline with an arrow from the several to the one and the text "supervised machine learning."Figure 2 – Supervised machine learning for condition prediction
Illustration showing various pipeline threats such as corrosion, cracking, bending strain, geometric defects and third party damage.Figure 3 – Pipeline threats

This technique can be applied to a number of different pipeline threats (Figure 3), but initial work has focused heavily on the endemic and ever-present threat of external corrosion.


As a first attempt at condition prediction, we tested the feasibility of predicting the number of external corrosion anomalies in pipelines, using basic design and construction information alone – such as age, coating type and geographical location. More specifically, the target variable was anomaly density – the number of anomalies per unit length. This is a primitive but useful indication of the overall integrity status.

Chart with information on the frequency and anomaly density of Integrity Data Warehouse pipelines.Figure 4 – Anomaly densities for IDW pipelines (log10 scale)

Looking at the distribution of anomaly density (anomalies per kilometer) for a random sample of pipelines from the IDW, we can see that the values vary over several orders of magnitude. At the lower end of the distribution, we find pristine pipelines with less than one anomaly every 100 kilometers, while at the upper end, there are pipelines with more than 10,000 anomalies per kilometer.


To make things easier for the machine learning algorithm, we therefore bin the anomaly densities into four classes, as shown in Figure 5.

By feeding the design and construction variables and observed anomaly density classes into a supervised machine learning algorithm, we can create a predictive model. The model can then be tested on an unseen sample of pipelines to measure its performance.

On the left side a matrix showing the anomaly density range of the classes "low", "moderate", "high" and "very high" and on the right side a diagram with the corresponding frequency and anomaly density data.Figure 5 – Anomaly density classification

The first prototype model (a “boosted decision tree”) was trained on ~4,000 pipelines from the IDW and tested on ~1,000 unseen pipelines. Although these ~1,000 pipelines had also been inspected in the past, their ILI results were not used to train the model. The predictions for these pipelines were generated based on their design and construction variables alone (as would be the case for truly uninspected pipelines), and their ILI information was used exclusively for validation.

Matrix in which the true class and the predicted class are compared to determine the predicted results of the test dataset.Figure 6 – Confusion matrix – prediction results for test dataset

Even though external corrosion is an exceptionally complex phenomenon, the performance of the preliminary model was surprisingly good (Figure 6).

58% of the test pipelines were classified perfectly, and 92% of the pipelines were predicted within one class of the true value (roughly speaking, within one order of magnitude of the true anomaly density). While order-of-magnitude predictions sound modest, the model outperforms traditional risk assessment solutions by a considerable margin.

Nevertheless, those paying attention will spot the outliers in the bottom left and top right corners of the confusion matrix. To correct those inaccurate predictions, the next step is to incorporate new predictor variables, including local soil properties (pH, resistivity, etc.), land use, climate, terrain and even socioeconomic status. With higher-granularity data, it will also be possible to predict anomaly densities (or other aspects of external corrosion condition) for smaller segments, or even pipe joints (Figure 7).

Illustration of a pipeline for high-resolution external corrosion prediction. Figure 7 – High-resolution external corrosion prediction


This approach is now being developed further with the support of a major pipeline operator, and early results are extremely promising. As the accuracy of the predictions improves, they will be used to prioritize rehabilitation activities and, where appropriate, justify the costs of the modifications needed to allow internal inspection. There are many other potential applications, such as predicting the future condition of inspected lines, so look out for more examples in future articles. In addition, if you have any ideas for applications of the IDW, then do not hesitate to get in touch with the authors.


De Leon, C and Smith, M S (2020). Machine Learning to Support Risk and Integrity Management. Pipeline Pigging & Integrity Management (PPIM) Conference, February 2020, Houston, Texas, United States of America.

Portrait of Roland Palmer-Jones, Chief Operations Officer (COO) - Central Business


Roland Palmer-Jones 

Chief Operating Officer (COO) – Central Business

Contact us
Close up of a hand holding a cell phone on which the facet newsletter can be seen.

Not yet registered to facets?

Register now if you would like to see more stories like this and receive the latest news and updates.
Read more
Social Sharing Component