ICE (Individual Conditional Expectation Plots)
Last updated
Last updated
Code Implementation here Regression Classification
Partial Dependence Plots have a serious problem of showing a line close to flat indicating no significant change in outcome value but in reality, the data can be of equal values with opposite signs. To address this issue, we have Individual Conditional Expectation (ICE) plots. They were introduced by researchers from The Wharton School in 2014 by building upon the work done by Friedman. They also intend to solve the problem by showing interactions between the variables which is not possible to a great extent in PDPs.
ICE plots are built on top of PDPs, they disaggregate the averaged data thus providing a chance to inspect the effect of the predictor variable at each value level while keeping the values of others predictor variables constant. A basic ICE plot shows how varying the feature value for an instance affects the predictive outcome by of course keeping other feature values constant. It can be cumbersome at times to analyze all the data points at once, but it also provides us a way to plot only one single point.
Let us again consider the same example of a class where the academic advisor still wants to analyze how students have done in the exam. But now he has a new problem, he wants to understand how each student has performed and does not want any class average. For the same subject ‘ExAI’, he can take a student’s scores from all subjects and keep replacing the student's score with that of other students to understand how well he could have done marginally with the scores that other students might’ve gotten.
Consider N observations of {(XSi, XCi)} starting with i=1. Unlike PDPs where aggregated plots are produced by keeping the C set of features constant, here the plots are produced for every observed value in set S by keeping the same features constant. A curve is plotted for every fixed value of XC against observed values of XS. ICE plots solve the problem of providing insights about the model at a granular level, thus unearthing the average effect of PDP.
The volume of curves in a general ICE plot can be overwhelming and also intricate to understand at times.
The ICE plots have a variety of plots that can be plotted to make the analysis of a model more interesting and also takes it to a deeper level when needed:
A general ICE plot may be visually challenging to understand at times and also to differentiate between 2 curves originating from different points. To solve this problem, the curves can be centered such that they originate from a single point. In doing so, the difference between the curves can be easily spotted. The plotted plot is called the “c-ICE” plot It is observed that choosing the Centralpoint as the least value of outcome variables gives the best results.
These plots are useful to investigate the presence of any interactions and direction of the change in the predictive variable with respect to a feature by estimating the partial derivative of the curve. These are called the “d-ICE” plots. The derivative plots would display homogeneous curves showing only the difference in the level of prediction, heterogeneous curves would be present in case of any interactions of this feature with other features. The derivative plots can be given as:
Where g’(XS) is the derivative of the curve with respect to XS.
They are intuitive and easy to implement
They can help in decoding the interactions between variables
They can help analyze the model at a granularity of each instance
It can get cumbersome to derive insights from ICE plots at times
They can only display effects of only one feature at a time
References
Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/. (the images were taken from here)
The Elements of Statistical Learning: Trevor Hastie, Robert Tibshirani and Jerome Friedman