An introduction to drift in Machine Learning

Reading Time: 6 minutes

The only constant in life is change.

– Heraclitus.

The COVID-19 pandemic disrupted the world and caused outstanding issues. These include the shortage of beds in hospitals, the spike in the necessity of medical professionals, the Great Resignation, increases in poverty, surges in demand for daily resources, closed international borders and much more.

This rapidly evolving environment presented an unprecedented challenge to how businesses operate today. We can see this change by comparing past data with current data for most business scenarios.

Unsurprisingly, the data gathered from this time and the Machine Learning (ML) Models trained on these data shifted. This shift in the underlying behavior of ML technology is called drift.

Before their release, ML models are usually trained with very well-analyzed data. They are controlled by cleaning, carefully eliminating, and engineering the data they ingest. However, once the model is live in production, the model is exposed to real-world data, which tends to be dynamic and bound to change with time. This exposure leads to a gradual or sudden decay in model performance or metrics. This loss of model prediction power is called model drift.

Why does Model Drift occur?

There can be potentially three reasons for Model Drift in machine learning.

  1. 1. When the underlying behavior of the data changes due to some external events

These data changes can include events like recession, war, pandemics, etc. For example, during and post-COVID-19, people’s preferences and capacities for movement changed suddenly. A steep increase in the desire for personal immunity led to increased demand for natural/artificial immunity booster products.

  1. 2. When something goes wrong with the ML production setup, like a minor upgrade or enhancement

Errors at the ground-level lead to a change in incoming data values. For example, a slight change in a website allows the end user to enter the age manually rather than from predefined drop-down values without proper boundary checks for Null/Max values.

  1. 3. When the model’s training and application are misaligned

In this case, the ML model is trained for one context and applied incorrectly in a different context. For example, an Natural Language Processing (NLP) model trained on a corpus of Wikipedia data is used for writing news articles. This is not a fitting match because of the risk of utilizing inaccurate data.

Model Drift timeline

Once the model is deployed in production, model drift can also be categorized based on the pattern of time and magnitude.

Gradual Model Drift

A classic example is a change in the behavior of fraudulent customers trying to beat the Artificial Intelligence (AI)-based anti-money laundering systems. Simply put, as fraudsters become more vigilant of the existing AI/ML detection rules, they evade the system by changing their strategies to fool the model. Thus, the model quality will degrade as more and more fraudsters try to change their strategy over time.

Sudden Model Drift

When there is a sudden change in consumer behavior, the model trained on historical data won’t be able to account for the impact. During the pandemic, people started spending more on over-the-counter drugs for colds and coughs than earlier. Thus, a forecasting model trained on pre-pandemic data won’t be able to capture this underlying scenario.

Recurring Model Drift

This is more like drift seasonality. There is a behavior change, but it is repetitive or recurring. While analyzing the historical data for any online retail like Flipkart or Amazon, there is always a surge in the number of orders placed during festive sale seasons. When this data is fed to the model, it will lead to a drift in production.

Incremental Model Drift

When the occurrence of drift is continuous, but the magnitude increases with time, it is classified as incremental. A possible example could be a view time on viral advertisement content. Here with time, the average predicted view time would gradually shift away from the actual view time for a given week or month.

Blips in the Model Drift

You can think of blips as more of a noisy pattern where the drift is more random in terms of time interval and magnitude.

Fig 1 – Drift classification based on speed and magnitude

Types of Model Drift

Drift generally implies a gradual change over a period of time. The change in data or the relationship between predictors and the target variables drift are further categorized as given below.

Data Drift

Data drift is defined as a change in the distribution of data. In ML terms, it is the change in the distribution of scoring data against the baseline/training data. This can be further broken down into two categories.

  1. 1. Feature Drift

Models are trained on features derived from input or training data. When the statistical properties of the input data change, it might have a domino effect on the model performance and business Key Performance Indicators (KPIs). This can happen due to changes in trend, seasonality, preference changes, gradually with time or due to some uncontrolled external influencing factors. This deviation in the data during the overall time to market is called feature drift.

Fig 2 – Illustrating the model’s time-to-market journey

Additionally, failing to identify drift before model deployment may lead to a severe negative impact. Recommending a wrong playlist to a user is less harmful than suggesting a lousy investment option. This can lead to business or brand impact. To mitigate this risk, data drifts can sometimes be handled by simply retraining the existing model with the latest data. Sometimes one may have to replan everything from scratch or even scrap the model.

  1. 2. Label Drift

Label drift indicates that there is a change in the distribution of model output. In a loan defaulter use case, if the approval ratings are higher than the testing output, this would be an example of label drift.

Fig 3 – Explains the concept of Feature and Label drift with the decision boundary

Concept Drift

Concept drift occurs when there is a change in the relationship between the input and output of the model. Consider that a model, once trained, establishes a relationship P(Y/X) between an Input feature (X) and Output label (Y). Concept drift is further classified into two categories:

  1. 1. Virtual Drift:

    When there is a change in the distribution P(Y/X) of the Model input and true label, but the model performance still hasn’t changed.

  2. 2. Real Drift:

    When there is a change in the distribution of the Model input and true label, resulting in a change in the model performance.

Fig 4 – Illustrates Virtual drift v/s Real drift

For a loan application example, people often look at the loan applicant’s age, background checks, salary, loan history, credit ratings, etc. Age is often considered an essential feature (of higher weightage) when fed to the loan processing AI application. However, during the pandemic, the government announced several measures to curb the interest rate on home loans. This resulted in many older people also starting to apply for loans. This change in the behavior tends to decay the model unless it is retrained to capture the change in relationship to make it more robust to the current market scenario.

Fig 5 – Shows the effect of model retraining on the model accuracy

Prediction drift

Prediction drift is a deviation in a model’s predictions. The change in the model prediction distribution over time depicts the prediction drift. If the distribution of the prediction output label changes significantly, the underlying data will most likely change significantly. For example, over a given period, the prediction becomes more skewed towards a particular class label though there is no model decay.

Drift handling

Retraining the model with current data or additional labels is always a safe option for handling drift. However, given the time and cost involved, this might not always be the right approach.

Usually, it is best to find the root cause by checking the training features, comparing the distribution pre- and post-deployment, analyzing the label distributions, and a few more.

Piotr (Peter) Mardziel’s blog on Drift in Machine Learning explains that these training features include the following:

Replace the feature & retrain the model

One way to alleviate drift involves replacing the feature causing the drift and using the mean/median/mode value instead. This helps save the effort required to retrain the model or build a new one from scratch. This approach proves to be a cost-effective way of improving model performance.

Remove the feature & retrain the model

This approach discards the features causing drift with a plan to retrain the model with new alternate or remainder features to help improve the performance. This approach is resource intensive because it involves finding new features, model retraining and comparing the benchmark to find a suitable replacement.

Add the feature & retrain the model

Another approach to drift alleviation involves completely retraining the model with additional features and the existing input setup (including the one causing drift). One can include a feature indicating if there is a drift which helps the model to take drift weights into account to predict the output for a particular instance.

Re-weight the data & retrain the model

Ideally, the distribution of Input and Output should not vary between training and deployment time. However, realistically, this variation is more likely to be the actual cause of drift. If we have a few training samples causing drift in the production, we can re-weight or up weight such instances and retrain the model. This way, the model learns more accurately from such samples to address the drift.

Alternative methods

Retaining the model on the latest data or retraining the model with both old and new data provided does not degrade the model’s performance. If the model allows weightage, use all data but assign higher weightage to new data to force the model to pay more attention to new data while training. Another option is online learning, where the model learns in real time with the help of scoring data and feedback.

Contact us today at Refract by Fosfor to learn how you can avoid Model Drift.


Manish Singh

Senior Specialist - Data Science, Refract by Fosfor

Manish Singh has 11+ years of progressive experience in executing data-driven solutions. He is adept at handling complex data problems, implementing efficient data processing, and delivering value. He is proficient in machine learning and statistical modelling algorithms/techniques for identifying patterns and extracting valuable insights. He has a remarkable track record of managing complete software development lifecycles and accomplishing mission-critical projects. And finally, he is highly competent in blending data science techniques with business understanding to transform data into business value seamlessly.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

Bias in AI: A primer

While Artificial Intelligence (AI) systems can be highly accurate, they are imperfect. As such, they may make incorrect decisions or predictions. Several challenges need to be solved for the development and adoption of technology. One major challenge is the bias in AI systems. Bias in AI refers to the systematic differences between a model's predicted and true output. These deviations can lead to incorrect or unfair outcomes, which can seriously affect critical fields like healthcare, finance, and criminal justice.

Read more

Boosting Decision Intelligence with Lumin on the AWS cloud

Data management and insight generation are the lifelines for any modern-day business. Today, many organizations struggle to manage their data efficiently. Unfortunately, the turnaround time from data to insights to decision to execution takes several days or weeks. This cycle delays efficient business insight generation, thereby incurring significant resource misuse.

Read more

Broker performance analysis solution: Analyzing broker performance as an insurance carrier

Most, if not all, large insurance and re-insurance carriers today work with brokerage agencies to grow their books and ensure a healthy stream of business. Depending on the carrier size, they might often work with dozens of agencies spread across the globe, each with its own operating processes and ways of working. For broker managers, monitoring agency performance and working with them to target the right lines of business, suitable policies and the right customers can be a nightmare. The need to be able to quickly analyze broker performance and take corrective actions to meet submissions. As such, underwriting targets is critical. Lumin's Decision Intelligence capabilities make this task considerably simpler. Let’s dive in.

Read more