The inability to predict what needs maintenance in remote energy plants (be it a windmill or an oil rig) costs more than loss of operational efficiency; violation of regulatory requirements, reputation, fines, and adverse impact on the environment are all at stake.
For Prospect Energy, this amounted to them choosing not to operate in Colorado after a fine of $1.7 million. Columbia Gas Transmission in 2008 and 2009, Sunoco Inc. R&M in 2010, BP Pipeline Co in 2004, Gulf South Pipeline Company in 2011, Amoco Oil Company in 2005, Southern Natural Gas in 2007 and 2010, and the Deepwater Horizon spill in 2010 faced heavy regulatory sanctions, and they incurred significant costs due to their failure to predict and proactively maintain equipment as documented by the U.S. Department of Transportation’s Pipeline and Hazardous Materials Safety Administration.
About a decade ago, a major player in the oil and gas industry recognized the need to become more proactive and enlisted Zemoso's help to build a solution for tracking and predicting maintenance requests using the rudimentary artificial intelligence (AI) available at the time. We used data fed from drones and sensors to monitor operations and generate recommendations.
However, the Internet of Things (IoT), drone technology, and AI have come far, especially with large language models (LLMs) entering the fray 2 years ago. Our technology incubation team therefore ran a few test proof of concepts (POCs) for certain predictive models against each other with synthetic data for use in the utilities and energy industries.
As global energy and connectivity demands surge, energy and utility companies must maintain the critical equipment that powers our world. According to a global survey conducted in 2024, involving 1,165 MRO (Maintenance, Repair, and Operations) professionals across various industries including manufacturing, retail, and hospitality, only 30% of facilities actively use predictive maintenance.
Companies are reluctant to adopt despite its benefits due to the substantial initial cost of adoption and setup. Many of these companies also often lack the technological expertise to create, iterate, and deploy a phygital solution for predictive maintenance.
For wind turbines, for example, a failed turbine can lose $2,000 per day, if not more, and a trip to one of these remote farms can cost upwards of $5,000 per day.
The solutions we have built in the past combine historical data in the systems with real-time data collected by drones and sensors to predict accurately.
Despite some advancements in the right direction to digitize, most organizations still struggle with (1) adapting out-of-the-box solutions to their complex, specific use cases and (2) reducing false positives. They rely on partners who have sufficient industry experience to take the variability and complexity of their situation into consideration while building these solutions.
There are many architectures that enable predictive maintenance for oil and gas, utilities, and other energy companies. All software engineering leaders will agree that training a time series machine model is complex and these models are very different from standard classification or prediction models. Even deep learning models such as LSTM have their limitations. Can a GPT-like model for Time Series make a difference? We wanted to explore this option and conducted a POC and experiments using synthetic data. The results were encouraging.
TimeGPT is a generative pre-trained transformer model designed specifically for time series forecasting. As companies try to reduce time to market, the fact that it is product-ready also helps. It adapts the transformer architecture, which was originally developed for natural language processing (NLP) to handle sequential and temporal data common in time series.
While we are not discounting the limitations that most early systems like these have around potential inaccuracies in very complex scenarios, dependence on training data quality, difficulty with identifying and considering outliers, and the opacity around some of the model’s inner workings, it is worth exploring because of the following reasons.
Based on our understanding of the transformer architecture, we expected that TimeGPT (in combination with vector databases) would be able to capture nonlinear relationships and recognize complex patterns with even limited datasets. As a result, it would apply to and could work well across subparts of complex machinery and complex rigs, machine floors, and energy plant setups. It could then reduce the amount of pre-processing the data needs and self-resolve missing values or irregular time intervals. It could also reduce the impact of data quality or labeling problems on forecasting.
With the above hypothesis in mind, we set about to figure out how a company in the energy or utilities sector could quickly deploy a predictive maintenance solution while still using its existing sensor and drone infrastructure to predict what will need maintenance and when.
1. Forecasting using TimeGPT: TimeGPT is a pre-trained transformer specifically for, as the name suggests, time series data. If we feed historical operational data for any equipment, such as wind turbines, into the TimeGPT, it would enable us to forecast time series operational data, which could have potential faults, with more accuracy.
2. False positive classification: We tested classifying false positives, using a vector database and relying on their inbuilt pattern recognition capabilities. We could also use a temporal vector database, such as Weaviate with versioning and temporal capabilities, to store time series data with nearly 90% compression. We classified false positives using two vector spaces.
LSTM (Combined category training and individual category training): Long short-term memory (LSTM) is a kind of recurrent neural network that captures long-term dependencies in time series or sequential data. It uses memory cells and gates to control information flow, allowing it to retain or discard information over some time, based on its understanding. A powerful LSTM model can understand and decode long-term dependencies and mitigate vanishing gradient problems with the right gating mechanisms and additive gradient structures. However, the models find with very long sequences challenging and can be computationally intensive. It also struggles with handling non-sequential inputs, which sometimes may be the case. We evaluated combined category training that allows training models to learn patterns across multiple categories and individual category training focuses on specific patterns within a single category. While combined-category training is better for capturing broader relationships, single-category training could be more effective for more specialized tasks.
LSTM-GAN: The LSMT-GAN architecture combines the LSTM networks with Generative Adversarial Networks (GANs) to improve data augmentation and the performance of the model for different predictive tasks. This model architecture included a generator to include LSTM layers to process sequential data and generate synthetic samples. It also included a discriminator that evaluated the authenticity of generated samples. This architecture captured temporal dependencies in the context of sequential data, improved the quality of generated samples, and the combination allowed better processing of time series data. By combining the powerful data augmentation and synthetic data generation capabilities of the GAN, and the ability to handle sequential data using LSTM, this architecture outperformed LSTM models. The LSTM-GAN architecture demonstrated the potential of combining recurrent neural networks with adversarial training to tackle complex tasks involving sequential data and generative modeling.
TFT: The Temporal Fusion Transformer (TFT) is a much more advanced neural network architecture and it is very effective for multi-horizon forecasting tasks where prediction for different time points in the future is of paramount importance. This becomes very lucrative and operationally impactful in the energy and utilities industry as prioritization and understanding the criticality of a problem is important. This captures long-term dependencies, predicts multiple future time points, has gating layers to control information flow and suppress unnecessary components, and can identify the relevant features for predictions. This kind of architecture typically contains a static covariate encoder, temporal covariate encoder, self-attention decoder, and a gated residual network. It also can handle static, known future, and historical inputs.
While these models are powerful, the TFT Model has limitations in handling extremely long-term dependencies and depends on carefully engineered input features. It may struggle with highly irregular time series or when given sudden, unprecedented changes in the data patterns. The model's performance can also be sensitive to hyperparameter tuning and requires significant computational resources for training and inference.
The LSTM Model is limited by its ability to capture very long-range dependencies and depends on the quality and quantity of training data. It can struggle with the vanishing gradient problem over extremely long sequences and may have difficulty in parallel processing, which can slow down training and inference times. LSTM models cannot easily incorporate non-sequential inputs or global context without additional architectural modifications.
The LSTM-GAN model has limitations in training stability and mode collapse and depends on careful balancing of the generator and discriminator networks. It can be challenging to achieve convergence, especially for complex sequential data. The model may also have trouble with interpretability, making it difficult to understand the reasoning behind its predictions or generations. Additionally, LSTM-GANs can be computationally intensive and could require extensive fine-tuning to achieve optimal performance for specific tasks.
The LSTM architecture with combined category training was able to predict general faults, but it wasn’t able to identify fault types. It also couldn’t predict multiple faults simultaneously and couldn’t effectively forecast maintenance timing. The LSTM architecture with individual category training was better, and 10% more accurate than the combined category training architecture, but it required substantial computation resources. It was less efficient from a setup point of view. While it excelled at predicting failures in real time, it couldn’t predict future maintenance needs.
The LSTM-GAN architecture was more impactful than any of the LSTM architectures individually, but it relies heavily on the quality of the original data. If you don’t have enough or the best quality of original data covering all fault types, this model will likely not be as effective, as the quality of synthetic data generated will be suspect. This challenge will become very prominent when you have new parts or equipment. TFT ran into similar challenges. It captured temporal patterns and provided more accurate forecasts, but the scarcity of fault data limited its overall performance. It needed a lot more original fault data in each fault type to be effective and really accurate.
Things changed, however, when we combined TimeGPT with a vector database. The TimeGPT + Vector Database architecture combines the powerful temporal modeling capabilities of TimeGPT with the efficient similarity search and retrieval mechanisms of vector databases. This model architecture leverages TimeGPT's ability to handle complex time series patterns and multi-horizon forecasting, while vector databases provide fast and scalable storage and retrieval of high-dimensional data.
The combination allows for improved processing of large-scale time series data, enabling more robust and versatile analysis, and forecasting solutions. By integrating TimeGPT's temporal understanding with vector databases' ability to handle diverse data types and perform rapid similarity searches, this architecture can overcome limitations in handling extremely long-term dependencies and processing highly domain-specific forecasting tasks. The TimeGPT + Vector Database approach demonstrates the potential of combining advanced transformer-based models with specialized data storage solutions to address complex time series challenges, offering enhanced performance in tasks such as trend detection, anomaly identification, and real-time analytics.
In our POCs, we observed that this architecture predicted real-time and future faults with 10% more accuracy than any of the other models. For fault category classification, the accuracy improved incrementally as we continued to add synthetic data for each fault category.
The architecture significantly improved fault detection accuracy with an equal proportion of faulty and non-faulty data. It improved fault classification when the dataset was augmented with synthetic data and outperformed all previous models. It also reduced the likelihood of false alerts and enabled more reliable maintenance scheduling.
Predictive maintenance is crucial for energy and utility companies to ensure operational efficiency, reduce downtime, and comply with regulations. Advanced models like LSTM, GAN, TimeGPT, and vector databases can help these industries transition from reactive to proactive maintenance strategies, and digitize faster. TimeGPT, and vector databases can enable us to build even more powerful predictive companies.
If you’ve used any of these models, we’d love to hear about their effectiveness and your experience.
In our POCs, we used synthetic wind turbine data from Kaggle, generated using Microsoft Azure’s Predictive Maintenance Template to run the POCs. Ideally, to run model comparison experiments, one would use a combination of real-world data and augment it with synthetic data to make the ML models more robust. However, as real-world data is often proprietary and sometimes unavailable, we focussed more on the “relative” performance differences as opposed to absolute numbers to reach our conclusions.
We recognize that the absence of good and stringent inspection is the hardest on the environment and living beings. But, for this blog, we have focussed on the cost to utility and energy companies.
Yohanes Nuwara: https://www.kaggle.com/code/yohanesnuwara/iiot-wind-turbine-analytics/notebook
©2024 Zemoso Technologies. All rights reserved.