This article is Part 3 of a series of three articles. In the previous article, I outlined an example of graphs for model validation for linear regression. In this article, we will take a closer look at a practical application for Logistic Regression.
Conventions Used for this Article
The following subsection category names have been defined based on the preferences of the audience. The nomenclature was selected based on my favorite race of humanoids in the Star Trek universe.
Includes deep technical and academic content for ML practitioners.
Includes content at a high level for managers and other business-level professionals who may work with ML practitioners.
All code used to produce the visuals in this article can be cloned or downloaded from https://github.com/jbonfardeci/model-validation-blog-post.
Validating Classification Models
Logistic Regression and Multiple Logistic Regression are types of classification models for two or more labels of a target variable. For classification models, the convention is to employ a goodness-of-fit (GoF) test to determine if the model has been specified correctly. If the GoF test results in a p-value (probability value) that is less than the significance level, say alpha=0.05, we reject the model. Otherwise, we accept the model.
The GoF test that is commonly applied to classification models is the Hosmer-Lemeshow (HL) test. But the HL test has serious problems, especially that it’s subject to providing false negatives or false positives for GoF with even slight changes to the test’s arbitrary hyperparameter for group size.
Furthermore, the HL test has been shown to produce wild swings in p-values due to large data sets.
The Hosmer-Lemeshow test detected a statistically significant degree of miscalibration in both models, due to the extremely large sample size of the models, as the differences between the observed and expected values within each group are relatively small.
~ Journal of Palliative Medicine. Volume 12, Number 2, 2009.Per the quote above, the errors were relatively small, meaning the model explained the target variable Y reasonably well. But the HL GoF said otherwise! This is a false negative or what’s known as a Type II Error. In other words, we failed to reject the null hypothesis (H0) when we should have. While we described marginal model plots in the context of linear regression models, they also work very well for classification models that predict the probability of an observation belonging to a class. This applies to popular classifier models such as logistic regression, decision tree, random forest, boosted trees, and Support Vector Machine. To create a marginal model plot for classification models we can utilize the same function described for linear regression models. (Figures 6a-6c below)
While we described marginal model plots in the context of linear regression models, they also work very well for classification models. As shown in Figures 6a-6c (below) for a two-class logistic regression model.
In Figures 6a-6bc below, we overlap a model’s predicted probability value for each observation on top of the actual Y values on the Y-axis, and any one of the continuous numerical predictor values on the X-axis. Even though Y consists of only finite integer values (0, 1, 2, …n) indicative of class labels, the LOESS (aka LOWESS) function “smooths” both the Y and predicted Y (ŷ) values so we can compare apples to apples.
Marginal Model Plot for Y & ŷ by X1
Figure 6a. X1 is a very good predictor of Y.
Marginal Model Plot for Y & ŷ by X2
Figure 6b. X2 is a good predictor of Y except between values between ~1.0 and ~3.5.