Home » Using Normal Curves to Examine Climate Anomalies

Using Normal Curves to Examine Climate Anomalies

The social sciences don’t have a lot of normal distributions, but our natural science friends have a few. Here, we’ll stick with the weather theme from the first section. Very conveniently, summer temperatures in the Northern Hemisphere are almost perfectly normally distributed, and we’ll look at that data for our example. Hansen, Sato and Ruedy (2012) ask a very important question: Is the average summer temperature increasing over time? To do this, they capture temperature data from all over the world over a long period of time (since the 1950s) and examine trends in both global and local means over time, as well as trends in variability. Take a moment to jump to that paper and skim it a bit.

Our first question is always, are we looking at a population or a sample? This is immediately followed by, are we looking for information (inferences) about a population or a sample? In this case, we have all the possible observations in our data, so we have a population of temperature readings – in this case, summer daily high temperatures in the Northern Hemisphere. We are also looking for inferences about population data: has the population mean of high temperatures increased in recent decades – that is, is the mean of the later populations higher than it was in the reference population? To give full consideration here, we should also examine whether the dispersion of temperatures has increased, meaning that daily highs are more variable than they used to be, and whether the data still conform to the normal distribution of the parent population.

In this paper, let’s focus on Figure 4 (page E2418), and in particular, the leftmost figure in the bottom row. The black line in each figure shows a normal distribution centered on the mean from the reference period, here, 1951-1980, of Northern Hemisphere (NH) land temperature observations.[1] The colored lines show distributions of decade-long temperature observations – populations of all spatial observations bounded by a temporal domain – for the last six decades, including the three in the reference period. The distributions are somewhat rough, but very close (in statistical terms) to normal distributions.

What is immediately obvious in the first panel is that the means of the distributions – their peaks and centers, since these are roughly normal – have moved to the right, that is, that average temperatures are increasing in each decade. But have they increased enough to be statistically different, meaning that later decades belong to different populations than the reference population? You’ll note that the x axis here is marked off in standard deviations from the mean, which is labeled as 0. So to answer this question using inferential statistics, what we’re really interested in is whether the means of the later distributions are roughly 2 or more standard deviations away from the mean of the reference (black-line) population. Why 2 standard deviations? Because that means the mean of the questionable distribution falls outside the 95%-of-all-observations band of the reference population. In our case, then, we are looking to see whether the mean of the final decade (just for convenience) is at or beyond the “2” value on the x axis.

Eyeballing the data, it’s pretty clear that the peak of the pink curve is to the right of the black curve’s mean. But is it far enough away to be certain that these distributions are different? That’s less certain. At best, the pink means are about 1.5 standard deviations from the mean of the black distribution. This is substantial, but statistically speaking, it’s not statistically significant – it doesn’t exceed our critical value of 2 (1.96).

That answers our question about whether the mean summer temperature in the Northern Hemisphere has indeed increased – it has, but statistically, we are less certain. The mean has only moved by about 1.5 s.d., which is below our critical value of 1.96 (2, for convenience). But we were also interested in knowing whether the variation in daily summer temperatures has increased. This is a question about the dispersion. This is a little harder to do graphically, and we don’t have a convenient critical number to look for.[2] A rough measurement on the figure, however – for example, one of your fingers or your pen or highlighter – suggests that at any point on the y axis, the pink distribution (2001-2011) is wider than the black distribution of the reference population. This supports, albeit vaguely and loosely, our hypothesis that variation has increased. Ocular analysis, meaning eyeballing the data like we have here, rarely provides answers as firm as actual statistical analysis. It can, however, be a good way to find out whether investing the time and energy to figure out what to calculate (and how) to test the hypothesis formally is worth it.

Beyond the two questions we asked, the authors of the paper are also interested in whether the number of “climate anomaly days,” where the temperature is more than 3 standard deviations from the reference population’s mean, have increased over time. This question is answered in Figure 5 (p E2419). It is essentially a question about whether the data still conform to the normal distribution described by the reference population: do we still see only 0.15% (the right-hand half of 0.3%) of the observations having values greater than 3 s.d. from the mean?[3] If we observe more observations outside that range than the normal distribution tells us we should, then the distribution is no longer the same as the reference distribution’s normal distribution.

We’ll continue to look at Northern Hemisphere summer land temperatures (top row). Notice that the y axis is percent of land area reporting the noted temperatures of 2 or 3 s.d. away from the mean. For this analysis, the authors divide geographic areas into cool ones, whose average temperatures are below the whole-reference-population mean (the left panel), and hot ones, analogously defined as above the mean (right panel). If the hypothesis about more warmer days is correct, we should see fewer locations (a smaller percentage of land area) reporting extreme cold, and more areas reporting extreme highs.

This is indeed what the figures show us. In each panel, the area on the left half shows the reference period and the right half is the recent trend period. The figure on the right of cool areas shows a marked trend down towards zero, indicating less area of extreme cold events. The figure on the left, on the other hand, of areas that were above the mean in the reference period, shows a clear trend upward in all three lines (1 s.d., 2 s.d., and 3 s.d). This indicates that more land area was experiencing hot, very hot, and extremely hot temperatures, respectively, at the end of the period than during the reference decades.

            The point of this extended example is to help you understand that even extremely limited use of quantitative tools – here, the mean, standard deviation, and normal distribution – can produce very useful insights. Don’t be afraid to use what you have, even if it’s straightforward and seems simplistic to you. I’ve simplified the authors’ measurement strategy here (in fact, I glossed over it almost entirely), but that doesn’t mean you can’t use the techniques in this model as a starting point for your own research design.


[1] Temperature data over water depends on whether a ship passed through the area at the time and is less consistent. Looking only at the land values gives us fewer missing data points relative to the available data.

[2] A statistical test to compare the equality of standard deviations exists, but we’re not going there. (It’s called an F-test; you compute the ratio of the variances [the squared standard deviation] and if it’s near 1, they’re statistically indistinguishable. The critical value depends on the number of degrees of freedom, which is a function of the number of observations in the data. Online calculators exist if you are really curious.)

[3] In this particular case, we are conducting a one-tailed test. Because the hypothesis is that global temperatures (and therefore ‘climate anomaly’ days) are increasing, we are only interested in values on the right-hand side of the figure, above 3 s.d. We could conceivably ask whether the number of days below the 3 s.d. cutoff have also increased, suggesting more climate extreme days than the normal distribution would predict, but we’re not interested in the 0.35% on that side of the figure, so we’ll only look at the high end. A two-tailed test would look at both sides, counting the total number of extreme days compared to the normal distribution expectation. This is normally what we do, even with directional hypotheses like our initial one, since it’s a more conservative test (the two-tailed critical value is higher than the one-tailed value).

Archives

No archives to show.

Categories

  • No categories

Site contents (c) Leanne C. Powner, 2012-2026.
Background graphic: filo / DigitalVision Vectors / Getty Images.
Cover graphic: Cambridge University Press.

Powered by WordPress / Academica WordPress Theme by WPZOOM