ALE (Accumulated Local Effects Plot)
Last updated
Last updated
Code Implementation here Regression Classification
Accumulated Local Effects (ALE) plots are built on the shortcomings of the Partial Dependence Plots which do not consider the effect of correlation among the variables. ALE plots address this problem by taking into account conditional marginal distribution which is not done either in PDP or ICE plots. This gives us the reason to trust ALE plots to take decisions when deploying huge Machine Learning solutions as they are unbiased. They work with categorical variables as well.
ALE plots consider into account that a feature might have interactions with various other features which leads to a particular value of the predictive variable. By considering Conditional Marginal Distribution for interactions between 2 variables. The conditional data is distributed in bins and plotted against another feature it has interactions with, averaging the difference between predictions which is calculated by subtracting the upper and lower limit of the bin produces various points. These points across the grids averaged across the grids and summed over all the grids, this gives us an estimate of an ALE plot.
Like the Centered ICE plots, the ALE plots are centered so that the mean effect remains zero. The ALE plots depend majorly on the calculation of the differences in predictions at various intervals and hence the number of grids also comes into the picture. The difference in prediction gives the effect a feature has for an instance in a certain interval. Let us consider a grid Z0,1, the difference between the predictions is calculated and then summed over all the instances in the interval. This is then averaged for the total number of instances in that interval. If you recall, the cumulative summation of all these averages across the grids gives the final estimate of the ALE plot.
The estimate of centered ALE is given by
ALE plots are much more preferable due to their unbiased nature which considers the interactions between variables. ALE can be interpreted as the effect of the feature at a certain value compared to the average prediction of the data. While plotting ALE plots, using quantiles is preferred such that there are an equal number of points in each quantile but using quantiles can lead to intervals of varying lengths which leads to the creation of some very weird ALE plots if the feature in question is not distributed well.
They are easy to plot and understand as they are centered at 0
They take into consideration interaction between features hence they’re unbiased
Interpretation of the plots can be difficult if the variables are highly correlated
The biggest disadvantage they face over PDPs are, they cannot be interpreted at a granular level like ICE plots