PDP (Partial Dependence Plot)

Code Implementation here Regression Classification

Introduction

Understanding functional relations between predictive and predictor variables can be difficult on a regular basis when using a black-box model. Partial Dependence Plots (PDP) were introduced by Friedman in 2001 who was facing a challenge in understanding the gradient boosted machine he was working on. Usually, it is easy to calculate the importance of a variable but tough to know and understand its individual impact on the predictor variable. PDP’s help us solve this problem by providing a way to functionally calculate and understand a variable’s importance.

What is PDP?

PDP’s visualize the marginal effect of a predictor variable on the predictive variable by plotting the average model outcome at different levels of the predictor variable. It gives an idea of the effect that a predictor variable has on the predictive variable on an average.

Assume a class of students who just gave their exams. Each individual’s cumulative grade is dependent on how they perform in each subject. The academic advisor for this class is interested to see how the average grade of the class is affected by one of the subjects ‘ExAI’ where the performance of the students has been mixed. What he does is he calculates the average of student’s grade for different scores scored in the ExAI by the students. To understand this better, he can plot a graph to see an average change in grade. This gives him an idea of the average change in the class grade at various score levels scored by students in ExAI.

This is exactly how Partial Dependence plots work, a selected predictor variable’s contribution to an outcome is calculated by calculating its average marginal effect which ignores the effect of other variables present in the model. These values are plotted on a chart which gives us an understanding of the direction in which the variable affects the outcome too.

As discussed earlier, the PDPs visualize the average marginal effect of a predictor on the predictive variable. To make it clear, let us consider a set S which is a subset of p predictor variables of the set C.

S⊂{X1, X2,…, Xp}

The set S is the set of variables whose response whose partial dependence on the predictor variables is to be calculated by keeping the other existing variables in set C as constant. The functional relationship would look like, f(X)=f(XS,XC). Then the partial dependence function of predictors XS on f(X) is

this can be estimated as true values of f and dP(XC) are not known by

where {XC1, XC2,…, XCN} are all the values of XC in the dataset. The partial dependence function results in a set of ordered pairs of partial dependence values at various levels. Friedman proposed to plot lines joining the coordinates which results in the Partial Dependence Plots. This results in such that only the marginal dependence between the variables we selected and the predictive outcome.

Pros

  • PDPs are simple, easy to understand and can be explained to a non-technical person without any difficulties

  • They can be used to compare models to decide which model works best for a use case

  • They are intuitive and easy to implement

Cons

  • PDP assumes on the default that the features are uncorrelated

  • They can only plot averaged marginal dependence function and cannot work on individual point, can be a huge problem when the dataset has only 2 equal opposite values

  • They also assume that there are no interactions between the variables which is highly unlikely in real world

  • Though interactions can be plotted, they are only limited to second-order

Visualizations

References

  1. The Elements of Statistical Learning: Trevor Hastie, Robert Tibshirani and Jerome Friedman

  2. Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/.

Last updated