Probit and logit models are scholars’ two basic tools for dichotomous (0-1) dependent variables (DVs). Much like OLS, these models can accommodate independent variables (IVs) at any level of measurement. Probit and logit models are functionally identical for most users, with the preferred model varying by discipline (political science likes probit). Their key difference is an assumption about the underlying distribution of errors; coefficients estimated by either method will be substantively identical even though logit coefficients are numerically 1.8 times as large as probit ones. You will sometimes see references to “logistic regression,” but you should be aware that the underlying estimation method is not least squares regression but is instead maximum likelihood estimation (MLE).
How do limited DV models work?
Probit and logit, and other models in the family of limited DV tools, respond to the problems created when we try to use OLS regression on a DV that is not continuous (that is, its values are limited). OLS is not the best model for limited DVs because the limited nature of the DV produces a number of serious statistical problems that make the OLS results highly questionable. We know that OLS predicts a line that best fits the points by minimizing squared errors. (See the first figure on http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter1/statalog1.htm for a visualization.)
With limited DV data, however, OLS has two significant problems. First, the linear estimate of the relationship will produce predictions that are “out of bounds” – that is, they will be less than 0 or greater than 1, and since the range of our DV is only [0, 1] that is a problem. Second, and more problematic from an econometric standpoint, the linear model will have significant heteroskedasticity (unequal variance of errors). At extreme IV values, the errors – the distance between the line and the data point – will be systematically smaller than those in the middle of the IV range. If you review the figure linked above, you can clearly see this. The result of both of these problems is that OLS estimates are not credible – their standard errors in particular are likely to be wildly incorrect – and most statistical tests and ‘fixes’ for the violations don’t work.
Probit and logit avoid this problem by assuming that the relationship between the DV and IV is a cumulative distribution function (CDF) of some underlying (latent) variable. CDFs are S-shaped curves, and they are useful here for two reasons. First, since they capture probability, they are naturally bounded at 0 and 1, so we cannot get an out of bounds prediction. Second, CDFs are also naturally nonlinear, with probabilities changing fastest in the middle of the range and more slowly at the ends. This fits what we intuitively understand about predicting whether something will happen: There’s a tipping point somewhere in the middle where the event goes from unlikely to likely.
That intuition is exactly the idea behind the latent variable. What we observe is a set of limited DV values, where y equals 0 (no) or 1 (yes). What’s really going on, however, is that there was some underlying probability of observing a 0 or a 1, and some threshold or tipping point exists in the data where we become less likely to see 0 and more likely to see 1. This underlying probability – called y* – is what the CDF captures. You’ve seen this in action quite a lot, every time you look at a weather prediction. We observe whether rain does not or does occur (0 or 1), but the weather report gives a chance of rain ranging from 0% to 100%. This probability of rain is y* and the actual observation of rain is the 0-1 DV. Intuitively, we know that 10% and 20% chance of rain means it’s highly unlikely, and 80% and 90% are almost certain. In the middle, though, in the 40-60% range, the probability of observing rain is changing rapidly. In the space of only 20% — the same distance as between 0% and 20% — we go from being unlikely to observe a 1 to being quite likely to observe a 1.
So what probit and logit models do is they predict the DV on the y* scale, and leave the user to convert that into predicted probabilities using the appropriate CDF (normal CDF for probit, logistic CDF for logit). This is why we cannot simply look at probit and logit coefficients and interpret their marginal effects directly as we do for OLS: The predicted DV value is not on the same scale as the observed DV. Plus, since the marginal effects change rapidly at some parts of the IV distribution, we must calculate the predicted probability at any one IV point while fixing the values of all the other IVs. Typically, we fix other variables at their means, medians, or modes, and examine the marginal effect of changing the IV of interest from minimum to maximum, increasing or decreasing by one standard deviation, or other similar strategies.
This is a bit of a pain to do by hand. My favorite way to do this in Stata is to use Gary King’s CLARIFY package (available from http://gking.harvard.edu/software). For those who prefer manual computation, Princeton University’s Data and Statistical Services group has a nice walk-through of how to do this (available on http://dss.princeton.edu/training/). Tutorials are also available online for other computing platforms, including R; just Google it.
As a note: probit and logit do not work by minimizing squared errors. They use a technique called maximum likelihood estimation, which is entirely different.
Useful Resources
http://www3.nd.edu/~rwilliam/stats2/l81.pdf – __ William’s fantastic (though slightly more technical) explanation of linear probability models and their problems. Lots of great visualizations and clear examples. Parts 2 and 3 of the logit lecture are also quite helpful; advanced users may particularly benefit from the OLS & logit comparison chart at the end of part 3.
http://rt.uits.iu.edu/visualization/analytics/docs/cdvm-docs/cdvm.pdf Indiana University guide to Dichotomous DV estimation and marginal effects calculation
John Aldrich and Forrest Nelson, “Linear probability, logit, and probit models.” Sage Quantitative Applications in the Social Sciences – https://a.co/d/c6FRcIN (your institution may have access to this via the Sage Research Methods portal)