Nonlinear relationships are ubiquitous in both theory and data in political science. Remember that we are talking about relationships: we must have specified both an independent and a dependent variable. The book covered the most common instances that we observe in data – GDP and just about anything. This essay goes a little more into some other common themes that recur that you might need to address, but it is by no means comprehensive or a substitute for your own theorizing.
Simple Nonlinearities
Two common types of nonlinearities exist: the S-shape, and the U-shape. (Both can also be inverted, for a reverse S that starts high and ends low, and an upside-down U that looks like a hump.) In both cases, what happens at the middle of the X variable’s range is different than what happens at the end. In the S model, we go from having a relatively stable or slowly-changing low (high) Y value at low values of X to having relatively stable or slowly-changing high (low) values of Y at high values of X. For an S-shaped example, consider a person’s degree of partisan identification as the X variable and the number of hours volunteered for a party or campaign as the Y variable. At low levels of partisan attachment, we expect little or no volunteering. At some point on that attachment scale, however, hours volunteered will start to climb. It will eventually level out, though, because there is a maximum of hours available to any individual for volunteering (or anything else; we all only have 24 hrs in a day).
The result of having S and backward-S shaped relationships is heteroskedasticity. The errors (residuals) are not randomly distributed around the regression line; they’re systematically higher in some regions of X and systematically lower in other regions of X. To remedy this, simply apply the standard fixes for heteroskedasticity – robust standard errors or, depending on your particular model context, clustered standard errors. In some cases, you may need to transform both the DV and IV by taking their logs; this typically happens in advanced cases where other problems exist simultaneously, so talk to your professor or methodologist before taking this route.
The U-shape and its invert behave a bit differently. For this form of nonlinearity, the value of Y is high (low) at both low and high values of X, and Y is low (high) at the center of the X distribution. For a U-shaped example, consider balance of power theory, which argues that the ratio of military capabilities (X variable) affects the probability of war (Y variable). According to balance of power theory, war is most likely (Y is high) when one state has significantly more power than the other – the ratio is very small (B has more power than A) or very large (A has more power than B). In the middle, however, the two states have equal or nearly equal levels of capabilities – the ratio is equal to about 0.5 – and so neither side can be guaranteed to win a war. War is thus least likely (Y is very low) when X is near 0.5.
For an inverse U example, consider states responding to an economic crisis. For the X variable, let’s consider veto points, or places in government where policy could get blocked, and for the Y variable, success at containing or managing the crisis. Having very few veto points or very many veto points is associated with low success at crisis management; policy swings wildly around with no checks on the executive, or it can’t change fast enough to respond effectively with too many veto points. Only at middling values of veto points could states both change policy fast enough to respond but also reassure investors and other economic actors that wild policy changes were unlikely. (This example comes from Andrew MacIntyre, “Institutions and Investors: The Politics of the Economic Crisis in Southeast Asia,” International Organization 55,1 (2001):81-122.)
The typical way to address U-shaped and inverted-U-shaped nonlinearities is to include an X-squared term in the regression model. The increasingly large values of the squared term capture the higher end of the X distribution, and the squared term should normally have the opposite sign as the regular X term, to reflect that the direction of the relationship changes.
Discontinuous Relationships
One of the more recently recognized forms of nonlinearities is when we expect a discontinuous effect of X on Y – a jump – at some particular value of X. In these cases, some threshold value of X triggers a permanent increase (or decrease) in Y. Up to a certain value, for example, children from low-income households are eligible for Head Start, a federally-funded preschool program. Just above that threshold, however, children are no longer eligible. So if we examined family income as a predictor of achievement, we should see higher achievement in the Head-Start-eligible group, then a sharp drop-off at the income eligibility threshold and a slower climb up to income levels where families can provide private preschool and pre-K opportunities. Not only the level changes, but the slope of the line too. Fortunately, econometricians have developed models to address these so-called regression discontinuities. They’re called, unsurprisingly, regression discontinuity models. They require a sharp cutoff or threshold point, and in political science, they are most commonly used (at least right now) in policy evaluation type research. Very few other issues in political science have such clear cutoffs or thresholds that would make this type of model work, but it’s worth being aware that they exist.