
North America’s water and wastewater infrastructure is aging and under strain. The American Society of Civil Engineers consistently grades U.S. drinking water, wastewater and stormwater infrastructure at near failing levels. Globally, many utilities face similar challenges of frequent pipe bursts, equipment failures and rising operational costs. In this context, enterprise asset management (EAM) and asset performance management (APM) systems augmented with AI-driven predictive analytics have emerged as potential game changers. These systems promise to move utilities from a reactive “fix it when it breaks” approach to a proactive maintenance strategy, detecting problems before they cause service disruptions.
EAM, APM & AI/Machine Learning
Traditional EAM software helps catalog assets, schedule maintenance and manage work orders. APM extends this by focusing on asset health and performance—often incorporating advanced tools like sensors, real-time monitoring and predictive algorithms. AI/machine learning (ML)-based failure prediction is touted to reduce unplanned downtime, optimize maintenance spending and extend asset life.
Utility directors are bombarded with bold claims about AI’s ability to predict failures weeks or months in advance, avoid costly emergencies and yield high returns on investment. While success stories are plentiful in vendor marketing materials, utility directors must understand the real-world limitations and implementation challenges of these systems, particularly for vertical assets like pumps, motors and treatment equipment.
The Promise vs. Reality
AI-driven failure prediction systems analyze data from sensors on critical equipment to identify patterns that precede breakdowns. When implemented effectively, they can deliver significant benefits. One wastewater treatment facility, for example, saved approximately $45,000 by preventing a major equipment failure during a six-month pilot program. This single prevented breakdown paid for roughly two years of their predictive maintenance service.
However, the journey from concept to reliable prediction is more complex than many vendors suggest. Here’s what utility leaders should know:
Data Quality: The Foundation That Often Crumbles
The adage “garbage in, garbage out” applies forcefully to AI prediction systems. Many utilities find their historical data is sparse, inconsistent or not in a usable format. Sensor readings might have gaps, failures might not have been logged with precise timestamps or different pumps might measure different parameters.
For utilities with decades-old equipment and limited sensor infrastructure, this presents a significant hurdle. Insufficient data is a major limitation. If an AI model has few examples of prior failures or only limited sensor trends, its predictions will be unreliable. In fact, some types of failures are so rare (a pump might fail catastrophically only once in 10 years) that ML systems struggle to statistically learn these patterns.
For vertical assets like pumps and motors, vibration analytics require high-frequency sampling rates (often >1 kilohertz [kHz]) and multiple measurement points per asset to establish valid baselines. Industry analyses emphasize that “data quality and availability are paramount to success, and incorrect or lacking data can lead to unreliable predictions or even incorrect maintenance actions.”1 This fundamental limitation means utilities with sparse historical data or insufficient sensor deployments will experience substantially reduced prediction accuracy regardless of algorithm sophistication.
In the end, adding AI to bad data just gives the wrong answer, or the wrong prediction, faster.
The Complexity Challenge
A documented limitation across wastewater implementations is the inability to handle complex multivariate environments. Current AI-driven asset failure prediction utilizes three primary ML approaches:
- Anomaly detection: Algorithms establish statistical “normal” baselines (termed “golden fingerprints” in some implementations) and flag deviations. These systems perform adequately for stable operations but struggle with variable conditions.
- Supervised learning pattern recognition: Models trained on labeled historical failures identify developing fault signatures. These require substantial examples of each failure mode—problematic for catastrophic failures that occur rarely (e.g., once per decade).
- Trend analytics with dynamic thresholds: Statistical projections are created based on developing patterns. The technical challenge is distinguishing between normal operational variability and genuine fault progression.
Water and wastewater systems operate under highly variable conditions and can experience widely fluctuating flows and loads depending on time of day or weather events, which can confound predictive models.
One wastewater case study found it “extremely difficult” to get a clear indication for cleaning aeration equipment because of numerous influencing variables like pH, temperature and time of day. This means AI might struggle to distinguish whether a change in performance indicates a developing fault or just normal operational variation.
The Implementation Timeline Reality
Marketing materials in this space frequently cite 99% failure prediction rates and “set it and forget it” automation capabilities. Contrary to these “plug-and-play” claims, there is a substantial lead time to set up, integrate and train these systems. A user cannot simply deploy an ML model without data preparation and expect accurate predictions on day one.
Most utilities report needing a minimum of six to 12 months of data gathering just to establish baseline equipment behavior. Achieving a high-confidence, reliable prediction system typically takes 1-2 years from project kickoff, depending on the complexity of assets and the frequency of failure events to learn from.
False Alarms & Missed Failures
Predictive models inevitably produce false positives (predicting failures that do not happen) and false negatives (missing actual failures). Current systems optimize for sensitivity (catching potential failures) at the expense of specificity (avoiding false alarms). This engineering decision reflects the higher consequence of missed failures but creates operational challenges. Technical teams must account for:
- Initial false alarm rates of 30%-50% during system learning phases
- The necessity of human validation of ML outputs before maintenance actions
- Ongoing algorithm tuning and threshold adjustments based on operational feedback
- Specific asset classes with poor prediction performance due to nonprogressive failure modes
The frequent false alarms early in deployment can erode trust. If not managed properly, crews can develop alarm fatigue and start ignoring warnings (the “boy who cried wolf” effect), potentially missing significant events.
No system achieves the 99% accuracy that is sometimes promised. A more realistic expectation is a reduction in unplanned outages (perhaps 50%-70%), but not elimination of all failures.
Integration With Legacy Systems
Utilities often operate with decades-old equipment and software. Getting data out of these systems or adding sensors to old assets can be technically challenging. Data silos also pose problems; maintenance data might reside in a separate database from operations data.
Without extensive IT integration projects, the predictive maintenance tool might lack the full context needed to make accurate predictions, severely limiting its effectiveness.
The Workforce & Skills Gap
Introducing AI/ML into a traditionally mechanical/electrical maintenance department faces human challenges. Staff may distrust a “black box” algorithm or fear that AI will replace jobs. Additionally, existing technicians might lack experience interpreting data trends or understanding predictive algorithms.
Contrary to marketing claims that “the AI does it all,” successful implementations require seasoned operators and maintenance engineers to be deeply involved to train, validate and guide the AI system. AI augments human decision-making; it does not replace the need for skilled personnel who understand why equipment might be failing.
Moving Forward Realistically
AI-driven asset failure prediction represents potential for an advance in water utility operations, but it requires a clear-eyed view of its limitations. For utility leaders considering this technology:
- Start with the basics: Ensure your asset registry, maintenance records and sensor infrastructure are solid before adding predictive AI.
- Choose high-impact use cases: Target assets whose failure causes major disruptions or chronic issues that drive high costs. Complete a risk and criticality analysis to identify critical systems and assets prior to deploying AI/ML.
- Pilot before scaling: Begin with a limited implementation to learn and build organizational buy-in.
- Invest in people and processes: Train staff and update procedures to incorporate predictive maintenance workflows.
- Set realistic expectations: Understand that the technology will eventually reduce, not eliminate, unexpected failures.
Finally, understanding how much effort and time it takes for AI/ML systems to reliably deliver accurate failure predictions, carefully evaluate whether the realistic likely improvement delivers a big enough advantage to the ability to intervene before an asset fails. It is possible, maybe even likely, that failure forecasts anchored in asset models based on first principles and physics can identify potential failures with more than sufficient time to act.
References
- Wiese, “Predictive Maintenance Using Artificial Intelligence in Critical Infrastructure: A Decision-Making Framework,” Int. J. Eng. Bus. Manag. (2024)