Note that there is not really a unified terminology for residual analysis in GLMs. I will try to use the most common names for these residuals in introducing the basics and point out any deviations from the norm both here and in the SciPy documentation.
In general, residuals should be defined such that their distribution is approximately normal. This is achieved through the use of linear equations, transformed linear equations, and deviance contributions. Below you will find the five most common types of residuals that rely mainly on transformations and one that relies on deviance contributions, though there are as many as 9+ different types of residuals in the literature.
These are simply the observed response values minus the predicted values.
In a classic linear model these are just the expected residuals
However, for GLM, these become
where is the link function that makes our model's linear predictor comparable to response vector.
It is, however, common in GLMs to produce residuals that deviate greatly from the classical assumptions needed in residual analysis, so in addition we have these other residuals which attempt to mitigate deviations from the needed assumptions.
The Pearson residuals are a version of the response residuals, scaled by the standard deviation of the prediction. The name comes from the fact that the sum of the Pearson residuals for a Poisson GLM is equal to Pearson's statistic, a goodness of fit measure.
The scaling allows plotting of these residuals versus an individual predictor or the outcome to identify outliers. The problem with these residuals though is that asymptotically they are normally distributed, but in practice they can be quite skewed leading to a misinterpretation of the actual dispersion.
These are the difference between the working response and the linear predictor at convergence (ie., the last step in the iterative process).
Anscombe (1960, 1961) describes a general transformation in place of so that they are as close to a normal distribution as possible. (Note: "There is a maddeningly great diversity of the forms that the Anscombe residuals take in the literature." I have included the simplest as described below. (Gill 54, emphasis added)) The function is given by
This is done for both the response and the predictions. This difference is then scaled by dividing by
so that the Anscombe Residuals are
The Poisson distribution has constant variance so that it's Anscombe Residuals are simply
Easy right? Sure was until I ran into the binomial distribution. The Anscombe residuals are built up in a different way for the binomial family. Indeed, the McCullagh and Nelder text does not even provide the general formula for the binomial Anscombe residuals and refers the reader to Anscombe (1953) and Cox & Snell (1968). The problem is that following this transformation for the binomial distribution leads to a rather nasty solution involving the hypergeometric 2F1 function or equivalently the incomplete beta function multiplied by the beta function as shown by Cox and Snell (1968).
Gill writes "A partial reason that Anscombe residuals are less popular than other forms is the difficulty in obtaining these tabular values." The tabular values to which he is referring are found in Cox and Snell (1968) p. 260. It is a table of the incomplete beta function that was tabulated numerically for an easy reference. How difficult would it be to get this table with NumPy and SciPy?
import numpy as np
from scipy import special
betainc = lambda x: special.betainc(2/3.,2/3.,x)
table = np.arange(0,.5,.001).reshape(50,10)
results = 
for i in table:
results = np.asarray(results).reshape(50,10)
Now the Anscombe residuals for the binomial distribution are
Where n is the number of trials for each observation (ie., 1, , or )
To implement this in the GLM binomial family class, I defined an intermediate function cox_snell similar to the above. It now looks like
from scipy import special
import numpy as np
def resid_anscombe(self, Y, mu):
cox_snell = lambda x: special.betainc(2/3.,2/3.,x)*special.beta(2/3.,2/3.)
We multiply the above formula by np.sqrt(self.n) the square root of the number of trials in order to correct for possible use of proportional outcomes Y and mu.
Also, see the ticket here for a note about the incomplete beta function.
One other important type of residual in GLMs is the deviance residual. The deviance residual is the most general and also the most useful of the GLM residuals. The IRLS algorithm (as will be shown in a future post) depends on the convergence of the deviance function. The deviance residual then is just the increment to the overall deviance of each observation.
where are defined for each family.
Note that most of these residuals also come in variations such as modified, standardized, studentized, and adjusted.
Anscombe, FJ (1953) "Contribution to the discussion of H. Hotelling's paper."
Journal of the Royal Statistical Society, B, 15, 229-30.
Anscombe, FJ (1960) "Rejection of outliers." Technometrics, 2,
Anscombe, FJ (1961) "Examination of residuals." Proceedings of
the Fourth Berkeley Symposium on Mathematical Statistics and
Probability. Berkeley: University of California Press.
Cox, DR and Snell, EJ (1968). "A generalized definition of
residuals." Journal of the Royal Statistical Society B, 30, 248-65.