Carl Handlin Talks ​​Machine Learning Model Monitoring in Production
Carl Handlin, former Head of Data and AI at Rappi and currently the co-founder & CTO of Trully, attended Worldwide AI Webinar to discuss ML model monitoring in Production. Here are a few highlights of his speech.
Check out his keynote on our website and YouTube channel. 

3 groups of ML problems in a production environment

Carl emphasized the importance of monitoring machine learning models due to the non-stationary nature of real-world data, specifically unforeseen and unpredictable events that could change the whole course of action. In his words, “we can’t be always 100% prepared but we can be prepared to deal with certain types of events” with the help of constant monitoring. Moreover, the use of data adds another layer of complexity to accurately assess the performance outcomes.
He continued to specify three groups of problems of machine learning in a production environment:

Focusing on Model Decay is the way to go 

To manage these issues, Carl suggested looking out for Model Decay or the degradation of a model’s accuracy over time, which is the consequence of drift. 
Model Decay happens for four reasons:
Model Decay can occur in different ways:

Dealing with Drift

To solve this problem, Carl recommended looking into input data quality first, which might be caused by broken pipelines, schema changes, infrastructure updates, etc.
Then a Drift Analysis or comparing current data with reference data is required. The most straightforward kind of drift analysis is Performance Drift, which analyzes whether the actual performance of the model changes. However, having ground truth or labels is essential for performance drift, which might not always be available. Another way is Input Data Drift analyzing the distribution of features in the evaluated data that does not require having ground truth or labels.
Mr. Handlin went on to share a few ways to detect drift. Should you have the labels quickly, you can use the following six methods:
If you don’t have the labels quickly, statistical methods like Population Stability Index (PSI), Jensen-Shannon Divergence, Kullback-Leibler Divergence, or Kolmogorov-Smirnov Test would suffice.
Finally, if drift is detected, you should:
  1. Check the data quality
  2. Investigate
  3. Retrain if possible
  4. Rebuild if needed
  5. Use a fallback strategy
  6. Limit the model use
  7. Add custom processing logic
To listen to Carl Handlin’s whole keynote, check out our website and YouTube channel.