Future of anomaly detection for enterprises

‍

Speak to an aspiring founder, and they’ll give you examples of how Airbnb, Uber, and Pinterest achieved success. Speak to a high school student about the importance of going to college, and they’ll be quick to hail Mark Zuckerberg, Bill Gates, Steve Jobs, and so many more.

The outliers, the anomalies, are far more exciting, have more intriguing stories, and deserve more attention. They didn’t follow the rules.

However, ask a CISO, or a site manager at an oil rig, or a manager at a manufacturing plant, and they will tell you exactly how much they dislike outliers and anomalies. In industries where maintaining the status quo is everything, anomalies are not exciting. It’s a reason to go into hyper-dillgence mode and start looking at your contingency plans. It’s the reason you have “worst-case scenario” stimulations.

Surprisingly, anomaly detection has been around since the 1930’s, around the time that the industrial revolution was peaking in the western world.

‍

A view of early foundations of anomaly detection

Unsurprisingly, anomaly detection has it’s roots in the manufacturing industry. It’s from a book called “Economic Control of Quality of Manufacturing Product.” They would classify data values in a Gaussian distribution as anomalies if they fell outside three standard deviations from the expected value. In the 1980s, Dorothy E. Denning laid a stronger foundation with her work in “intrusion detection.” She was a pioneer in computer security. Then Bell Atlantic Science and Technology (now Verizon Communications) published a paper on activity monitoring that brought data mining techniques, setting baseline behaviours, and thresholding into the picture. Since then, anomaly detection has been used in healthcare (e.g., electrocardiogram), financial services companies, Uber, Netflix, Boeing, GE, and more. The information security industry has and is continuing to make billions for anomaly detection.

‍

What is the way forward in anomaly detection?

Artificial Intelligence (AI), Large Language Models (LLMs), blockchain, sensor networks, and massive increases in computational power have made it easier to mine data and detect anomalies in structured data and unstructured data, in numericals and documents, in images and videos.

However, one of the toughest conversations to have with clients is still around identifying the right model. Just because we can build a machine learning model or use generative AI and time series analysis doesn’t mean we should.

‍

How can rule-based engines help?

Rule-based engines use a predefined if-then logic. These engines analyze interactions in real-time and apply the preset rules to highlight any suspicious activities. For example, if a banking transaction goes over a certain amount, or maybe it originates from a high-risk location, or someone has multiple failed login attempts. Rule-based engines are transparent, easy to interpret, and will enable you to quickly implement new rules in response to emerging fraud trends. When you have ample, well-organized data, a rules-based engine will help you go-to-market faster.

In industrial manufacturing, a rule-based anomaly detection engine can be effective for monitoring the temperature of a blast furnace in a steel plant. In this scenario, simple rules can capture most anomalies without needing complex machine learning algorithms.

In a rule-based engine, the solution doesn’t need context. It doesn't need to understand the intricacies of metallurgy or thermodynamics; he just needs to follow his checklist.

Here's how the rule-based system might work in a steel plant to alert site managers if there is a risk:

Normal operation: 1200°C to 1600°C
Warning level: 1600°C to 1700°C or 1100°C to 1200°C
Critical level: Above 1700°C or below 1100°C

This simple rule-based engine can effectively detect temperature anomalies in the blast furnace, allowing operators to take appropriate action quickly. It simply follows the predefined rules to identify when something is amiss. There are many instances and use cases where a well-coded rules-based engine is effective—in simple, pre-set-driven anomaly detection situations.

‍

How did ML models revolutionize anomaly detection?

Machine learning models offered a more dynamic way to tackle anomaly detection. These solutions analyzed massive datasets and recognized patterns and identified anomalies, some times in real-time. Zemoso has run proof-of-concepts (POCs) with:

Logistic regression, which predicts the probability of a binary outcome (fraud or not, for example)
Decision trees and random forests, which break down complex problems into simpler parts to identify patterns
Neural networks, which use deep learning to recognize complex patterns

ML models adapt to new fraud tactics through continuous learning, becoming more effective over time.

For example, millions of calls, texts, and data transfers are taking place through a telecommunications plant. Data usage fraud used to be a fairly common problem. Here, a well-crafted machine learning model can track data consumption and quickly flag unathorized usage or data theft.

It could track information such as overall usage and cross-reference it by time of day or day of week, the type of applications used, the geolocation of data access, and the volume of data consumed. It can recognize each user’s digital fingerprint and the unique patterns of their data usage. When anomalies, such as spikes in usage or from unusual locations, occur, they are flagged and validated.

For example, let's say user Shiloh typically uses 5GB of data in a month, and they mostly use it for social media, email, and news applications in the morning. Suddenly, her account shows 100GB of data usage from video streaming at 3 AM from a location 1000 miles away. The machine learning model would immediately flag this as suspicious activity.

‍

What are k-means clustering and k-prototypes?

K-means and k-prototypes are clustering algorithms used in machine learning and data analysis.
K-means is like a party planner who divides the guests into groups based on how they mingle: people with similar interests end up in the same cluster. In anomaly detection, it spots outliers that behave oddly compared to it’s neighbors, much like that one guest who keeps to themselves. By grouping “normal” events and activities together, any outlier stands out, allowing analysts to quickly focus on potential risks. BigQuery ML demonstrates this by clustering authentication events to highlight suspicious attempts. At its core, K-means clustering is a technique for unsupervised learning that groups data points into K clusters based on similarities.

K-prototypes build on K-means capabilities. Assume that you also want to group guests by their dietary preferences, which are categorical attributes. K-means struggles with non-numerical data, but K-prototypes handles both numeric and categorical features. In anomaly detection, this matters when data includes transaction amounts, account types, or merchant categories. By respecting the ‘either/or’ nature of categorical factors, K-prototypes create clusters that truly reflect underlying risk patterns. Financial services and FinTech or retail platforms use this algorithm since their data often mixes numerical values (purchase totals) and qualitative descriptors (card tier, transaction location). By segmenting large volumes of hybrid data, K-prototypes highlight clusters likely harboring fraud, enabling more targeted investigations and resource allocation. This combined insight helps focus efforts where suspicious activity is most prevalent. Consequently, these methods excel. K-prototypes are more versatile and can handle categorical and mixed variables simultaneously.

‍

When to use k-means clustering vs. k-prototypes?

‍Choose K-means when your data is primarily numerical, like continuous transaction amounts or user login durations, because it excels at grouping points with similar measurements. Use K-prototypes if your data is a blend of numerical and categorical attributes, such as different account types, merchant categories, or locations. Standard supervised machine learning or AI models often require labeled training data, but in many outlier scenarios, suspicious patterns are not well-defined or labeled. Unsupervised clustering approaches like K-means and K-prototypes can detect new, evolving threats without explicit examples, making them especially useful for dynamic, large-scale data where anomalies appear in unexpected forms and contexts.

‍

Anomaly detection POC to test efficacy of multiple ML algorithms

For a software development firm, for example, in a recent POC, Zemoso evaluated multiple correspondence analysis (MCA), factor analysis of mixed data (FAMD), k-means clustering, and k-prototypes. The purpose was to detect behavioral anomalies in access or user behavior.
Initially, k-means was deployed using standard libraries (e.g., scikit-learn), but the presence of non-numeric attributes meant we had to convert categorical variables into numeric form. Methods like label encoding and one-hot encoding were tested. However, k-means did not yield strong clustering performance or provide meaningful anomaly explainability.

Recognizing that ignoring numeric features was not feasible for the anomaly specifications, we investigated k-mode for categorical attributes but found it unsuitable due to the importance of timestamps. We then turned to k-prototype, which accommodates both numeric and categorical data. While the k-prototype minimized distance by combining numerical and categorical measurements, it still posed challenges in explainability.

Next, we explored MCA—particularly its ability to handle multiple categorical variables by mapping them into a multidimensional space. We tested two Python libraries, mca and prince. Ultimately, we chose the MCA package due to its support for supplementary data transformation.

We also considered FAMD available via Prince, but did not proceed because it does not support supplementary data transformation.

‍

LLMs and anomaly detection

Large Language Models (LLMs) for anomaly detection offer unique advantages in certain use cases, particularly where complex, unstructured, multi-modal data is involved. Here's an example where an LLM model was used for anomaly detection in the wind energy industry. MIT researchers introduced the SigLLM framework, which uses LLMs to detect anomalies in wind turbines. This approach is particularly valuable because:

It can handle the vast amount of data generated by hundreds of turbines, each recording multiple signals every hour.
It doesn't require extensive training or machine learning expertise, making it easier for wind farm operators to maintain and update.
It can detect anomalies in real-world data that traditional deep-learning models missed.

Time-series and LLMs: LLMs have shown promising results in time-series anomaly detection. The SigLLM framework, for instance, leverages LLMs to analyze time-series data from wind turbines. LLMs, in that scenario, were used to forecast future values in a time series, with significant deviations from predictions indicating potential anomalies. They understood and interpreted complex temporal patterns that might be difficult for traditional time-series models to capture. They incorporated contextual information from unstructured data sources to enhance time-series analysis.

This capability is particularly useful in scenarios where anomalies may depend on complex, long-term patterns or where additional contextual information is crucial for accurate detection.

LLMs are better than traditional ML models in scenarios where:

There's a lack of labeled training data.
The system needs to adapt quickly to new types of anomalies.
The data is complex and unstructured.
Time-series data involves complex, long-term dependencies or requires integration with contextual information.

Traditional ML algorithms are perfectly effective when structured data with well-defined patterns is readily available. Until one week ago, computational efficiency and cost was a factor but now DeepSeek is definitely raising some questions around that. With time-series analysis, LLMs aren’t needed if it involves straightforward, short-term patterns.

Imagine a large hospital network using a powerful LLM for multi-modal anomaly detection in patient records. This system simultaneously analyzes unstructured text from doctors’ notes and patient complaints, structured data such as lab results and vital signs, medical images like X-rays and MRIs, and time-series data from continuous monitoring devices. LLMs are especially suited for this scenario because they can process and understand unstructured text more effectively than traditional ML models, integrate information from multiple modalities in a manner resembling human reasoning, adapt to new medical knowledge without extensive retraining, and capture complex temporal patterns that might go unnoticed with conventional time-series models.

P.S. This blog post is informed by a multitude of design and engineering proof-of-concepts executed and products developed by Zemoso product design and engineering teams for this startup and enterprise clients. To learn more about specific instances, please don’t hesitate to reach out.

‍

Detect, protect, scale:

the future of anomaly detection across industries