For entry and mid-level positions, the cost to replace an employee is between 30% and 150% of their annual salary. For executives, this figure can rise up to 400%.
Staff turnover costs companies big money. Needless to say, the time and resources spent on recruitment, onboarding, and training serve no purpose when the employee leaves, adding ‘loss of knowledge’ to the existing pile of losses suffered by the company. The most frustrating part is that the constant risk of employees leaving impedes the organisation from working proactively with the underlying issues behind the high attrition rate.
Even today, several organisations rely on gut feelings or obsolete data to put forward preventive measures against employee turnover. Not surprisingly, the management team gets caught off-guard every time one of their top talents leave the organisation.
Since the inception of Winningtemp, our product has helped more than 400 companies in visualising the accurate state of employee well-being in real-time. It enables managers and HR leaders to act on day-to-day data and quickly see the impact of various activities on the overall results. This has been an immense step forward in defining the future of work.
However, it still didn’t provide the users with churn indicators and the ability to identify the issues behind staff turnover.
Introducing Winningtemp Smart Prediction. Our data scientists have been working with artificial intelligence and deep learning to make Winningtemp more intuitive and robust. It works with millions of data points to find patterns in real-time and send warning signals to notify the managers of risks and opportunities.
The turnkey function adapts itself to your organisation’s ecosystem, analyses the results and transforms the time-series data into digestible information to
- Point out high-risk employee groups
- The approximate time until the employee quits
- The factors that can contribute to the employees leaving
- Suggest concrete actions that will help managers reduce staff turnover
How does Smart Prediction work?
To get insights about employee turnover from data, we need to somehow transform a stream of answers into estimates of when each user will quit. We also need to model uncertainty in these estimates so that we can accurately analyse the risk over different periods. This requires a model that can find and represent the intricate patterns inherent in users' answers.
The setup is illustrated in the diagram below, where we have historical data with answers to different questions on the left and to the right a probability distribution over time to the event that the user quits.
This is a supervised learning problem where the data consists of variable-length sequences of events, and it is not immediately obvious how to represent the explanatory variables. A simple approach would be to calculate various aggregates over rolling time windows. We decided to instead feed the raw event stream directly into a Recurrent Neural Network (RNN), which has the capability to learn relevant features on its own.
The target variables, i.e. the network output, are the parameters for the probability distribution that describes when and how certain the model is that the user will quit. This differs from regular binary churn classification in one important way - we need not specify a fixed churn-definition before training the model. The end result is an interpretable and flexible model that can be used to predict employee turnover in any time period.
When evaluating the accuracy of a model on historical data, we can see how well the probability distributions match the actual outcomes of users who have quit, primarily by evaluating how likely the model is to generate the same data. For currently employed users, however, all we know is that they did not quit before today's date. In Survival Analysis, this is called the censoring point, and the target for active users is to push the probability distribution beyond the point of censoring. By utilising all available data, every user, including the currently active ones contributes to the model training process.
How is the Smart Prediction model built?
Our approach is based on Deep Learning using recurrent neural networks (RNNs) with a Long short-term memory (LSTM) architecture. The network's feedback connections allow the model to identify and retain patterns in sequences of answers. It is implemented using Pytorch - a Python framework for differentiable programming.
What else can the model be used for?
At the lowest level, the model output consists of two parameters to a Weibull probability distribution that controls its location and shape. This approach is mostly inspired by the thesis and accompanying blog post by Egil Martinsson. It allows us to further calculate:
- The expected time until a user quits.
- The number of users per group that have a very high risk of leaving within, e.g. six months.
- How every single answer affects the churn probability.
The last item is derived from the ability to track the model's prediction over time, effectively allowing us to attribute employee turnover to every specific answer. This allows us to construct recommendations on a per-group basis on which question categories that should be prioritised to reduce employee turnover.
What’s in store for the future?
For an upcoming release, we are working on predicting the answers to single questions - generating a predictive index for each question category. This will help new customers focus on areas where their time is well spent and to significantly reduce their time to receive the first insight.
We are also working on Natural Language Processing (NLP) models that will structure and help navigate a large amount of textual feedback given in the system. By modelling natural language, we are able to extract the essence of a text and connect it to other essential data.