Empowering Organizations to solve Attrition with AI

Reading Time: 5 minutes


Employees who start and end their careers in a single business organization rarely come by. Employees often switch jobs after a few years of service in any given organization. Although the reasons may vary on a case-to-case basis, these switches could be either voluntary attrition, or organization-driven.

That being said, abrupt voluntary attrition is no longer a simple matter – it can seriously impact business growth KPIs. For example, frequent and unplanned hiring cycles are costly, and hamper operational productivity in businesses. Global HR teams have been looking for solutions for this phenomenon and have devised several programs and perks, exploring the potential of artificial intelligence and business intelligence to essentially retain employees. But it has not been adequate, or more accurately, precise in achieving its intended purpose.

Leveraging AI technology to understand employee attrition could be the way forward. Forecasting attrition trends could help organizations make meaningful changes in the present, enhancing retention in the future.

In this document, we will explore how enterprises can build resilient HR departments, capable of understanding and mitigating employee attrition issues.

At Fosfor, we have built Refract, a solution where we can understand, and accurately predict future churn events. An easy-to-use solution with visually consumable insights, business users can easily use Refract to understand the underlying attrition trends, and solve for future attrition possibilities, in the present.

But to do this, we had to first understand the causes of attrition or churn in any given organization.

While every employee’s reason for departing the organization may vary, there are several common factors that often contribute to employees’ voluntary churn. Employees who are planning to leave an organization are often looking for:

  • Higher compensation
  • Career advancement
  • Improved work-life balance
  • Better organizational culture and values
  • Better recognition in an organization
  • Easier commute or relocation

Although the organization’s culture, and lack of recognition for the employee play crucial roles in employee attrition, these factors are very difficult to be collected and quantified. So, in this case study, we will try to study drivers such as compensation, work-life balance, and commute, in correlation with demographics such as age, sex, etc.

An American healthcare service provider solved attrition issues with Fosfor’s Refract.

Leading American healthcare providers are facing serious personnel shortages issues due to rising attrition rates. Nursing staff, who play a pivotal role in delivering necessary healthcare to the patients at the grassroots level, is at the core of this phenomenon.

A few key challenges that organizations must overcome to mitigate churn:

  • Inability to accurately analyze the termination history data and respective reasons for attrition.
  • Not having an appropriate understanding of external factors like industry headwinds, global macro trends, industry sentiment data, and their possible impact on the HR data.

A lack of intelligence mechanisms to build and maintain data pipelines that can present the data in easy-to-consume formats. Inability to understand the history of events and data, and look into the future to know how to pivot.

The Refract Solution:

Refract provides users with an enterprise AI platform to track and consume the model performance. The data manipulation and modeling happen in the Snowflake environment, ensuring data security. The Streamlit applications are built and hosted on top of Refract, providing users with an interface to consume insights generated from the model directly through dynamic visualization widgets.

Key benefits of using Refract:

  • Powerful, flexible Data Science pipelines built through Snowpark for Python.
  • Quick visualizations of attrition facts & figures through Streamlit (quicker turnaround for everyday visualizations)
  • Interactive storytelling with a quick app-style deployment of notebooks

Solution Workflow

Decoding The Data to Insights Journey –

To understand and be able to predict employee attrition we have specific data points:

  • Employee demographics such as year of birth, sex, distance from the workplace, type of degree, tenure in the organization, and ethnicity along with their salary details.
  • Organizational details such as the type of hospital, the type of hospital ownership, and the US state the hospital belongs to.

Data for how much overtime each employee records. The churn variable, which indicates the employees who have churned.

Step 1: We have used Refract’s integrated Jupyter notebooks, which connect to the Snowflake instance, to start a new session. As Snowpark API requires Python 3.8, Refract also provides users with the flexibility to choose the relevant version of Python for use.

Step 2: We created a data pre-processing pipeline for simple manipulations such as missing value imputations, data scaling, and one-hot-encoding. We followed up with a model training pipeline consisting of a Random Forest algorithm and grid search for finding the best parameters. This entire code is written in a separate .py file in the form of Python functions, which is then sent to the Snowflake stage that you will be using. This training function is then registered as a stored procedure using Snowpark SQL.

Using the above code, you can send any file to your Snowflake stage.

This will register the model training pipelines as a stored procedure. Remember to mention the Python modules you will be using in your training pipeline, and if your pipeline has multiple functions, specify the main function as the handler of the SPROC (stored procedure(s)).

Using the above code, you can trigger the training stored procedure, while mentioning the variables that you don’t want to use in your training.

Step 3: Like step 2, you can create a separate file defining your prediction functions, sending that file to your Snowflake stage, and then registering the functions to Snowflake using the below code.

Here, we will be registering the prediction file as a Snowpark UDF (user-defined function), almost having the same imports and handler.

We can call the UDF and trigger the prediction on an entire Snowflake table(sdf).

Step 4: Once the model training and predictions are done, we bring brought the model back to Refract, along with all the necessary files for scoring the model and registered the model in Refract for monitoring and consumption.

Registering models on Fosfor Refract comes with a lot of benefits:

  • A visual representation of all the build time metrics.
  • Model’s decision making is explained.
  • Automatic API creation for basic consumption of the model.
  • Model monitoring can be scheduled to look out for any possible data drift.
  • Multiple versions can be deployed and compared.
  • Option to choose the compute power while deploying the model, giving you better control over resource utilization.

Step 5: Once the model is deployed on Refract, you will get the link to the API where you can provide the input payload and get the predictions. For small scale testing, on the other hand, you can directly consume the model in Refract itself.

Consume attrition insights with the HR Analytics app.

We have built and deployed a Streamlit-based app that gives the user a technical and visual understanding of the data, so that any stakeholder can understand the relationships between the variables, and understand the possible causes of churn among employees.

The data profile tab in the app gives you a snippet of the data, just so that you can understand the data you are working with, and below the data, you get the technical profile of all the variables present in the data, be it the distributions, cardinality, missing values, correlations, etc. This type of business intelligence profiling enables you to understand the data, decide what kind of pre-processing is needed for better modelling, and define the steps for pre-processing.

The Know your Data tab provides users with a visual representation of all the variables, with the appropriate graph for each data type.

In the above chart, you can select any variable and plot it against the salary data to understand the possible relationship between them. Here you can see that Acute Care – DOD hospitals on an average have lower salaries compared to other hospital types, which could be a reason for their high churn rate.

This app also enables the user to consume the trained model in a what-if analysis type of fashion, to predict the churn of an employee, given a certain parameter.

On the left-side pane, you can change the parameter and see how it affects the churn of any employee. Along with this, we get the import features, the confusion matrix, and the model parameters such as accuracy, precision, and recall.

All this information, presented in a single place, can enable any user to better understand the data and model to come up with possible reasons for employee churn. The what-if analysis helps users understand churn possibilities in the future for respective employee groups or individuals by tweaking the tenure and salary variables.

This app was built based on available HR data of the employees – information such as organizational reviews can also be used to better understand the culture of the organization, leading to a better understanding of employee attrition.

Want to learn more? Contact us to set up your free consultation today!


Ayush Kumar Singh

Specialist – Data Scientist | Fosfor (Refract) | LTIMindtree.

Ayush Kumar Singh has 6+ years of experience in executing data driven solutions. He is proficient in Machine Learning and deep learning and is adept at identifying patterns and extracting valuable insights. He has a remarkable track record of delivering end-to-end Data Science projects.

Latest Blogs

See how your peers leverage Fosfor + Snowflake to create the value they want consistently.

Lumin featured in 2022 Gartner Market Guide for Augmented Analytics

It was an honor to see that Lumin by Fosfor is featured in the 2022 Gartner® Market Guide for Augmented Analytics published by Gartner this month.

Read more

Making your Snowflake pipeline robust with Fosfor Spectra

"How can I avoid constantly jumping between Snowflake UI and Spectra UI to know what transformations would be apt for this data pipeline I am trying to configure on Fosfor Spectra?" 

Read more

Market share and aisle share: Category analytics for CPG Industry

Consumer Packaged Goods (CPG) companies, rarely have visibility of their market, competitors, or selling patterns because they depend on retailers as their end-customer. The presence of retailers in the process of selling CPG products to end consumers creates the need for multiple varied data access for CPG companies.

Read more