Home » Sampling for Your Own Research

Sampling for Your Own Research

If you are conducting your own survey research and need to create your own sample on which you intend to use quantitative analysis tools, you need to think carefully about the relevant population. Many strategies exist for drawing an unbiased sample, and which you choose depends on things like how you’re accessing your population (in person, phone, email, etc.?), whether you can access the whole population (prisoners and hospital patients aren’t accessible; students not living in the dorms may be harder to reach), and things like that. The critical thing about your selection mechanism is that it needs to be completely and utterly independent of things you are interested in. We often can obtain this by selecting on fixed, immutable values – things a person cannot change and has no influence over and which cannot plausibly be causes of your results. These may include date of birth, where you sample n people from each date of the month; last two digits of the student ID number, which was randomly assigned; etc. We have, for example, no good reason to believe that people born on odd-numbered dates are more liberal than those born on even-numbered dates, or that people with odd-numbered student ID numbers have systematically different beliefs than those with even numbers.

We can even use things like every nth person who exits the store or polling place to create an unbiased sample. People don’t count or manipulate how many people have gone out before them, so if we poll every 12th person to exit, we’re selecting on a variable that is unlikely to influence outcomes, like whether the person purchased breakfast cereal that day or voted for the incumbent candidate. We might even have a strategy like every 9th dorm room on the left side of the hall, interviewing whichever resident responds to our knock. This strategy has two parts: identifying a room and then identifying a respondent within that room. This is how most door-to-door or in-person survey efforts work. They have a rule to identify a household, and then a rule to identify the person in the household that they will interview. It might be a combination of rules, such as a household chosen randomly by a computer from a directory, then asking for the adult who had the most recent birthday.[1] The critical thing in all instances, though, is that the selection criteria are independent (or as plausibly independent as we can make them) from the outcomes of interest. Generally this means a selection rule that involves values people can’t manipulate.

One last note on sampling is in order. Many times, pollsters are keenly interested in ensuring that their samples are representative of the population – that is, that they match or approximate the population’s distributions on key variables such as age, race, gender, and/or region of the country. They design stratified random samples that take the total number of desired respondents and slice it into bins by various characteristics. For example, the Pew Research Foundation, using data from the American Community Survey, estimates that 14.2% of the US population identifies as Black.[2] So if we are planning a sample of 1000 people – a fairly standard sample size for a nationally representative sample in the United States – we would want about 142 of them to Black. Moreover, according to the same team, 12% of Blacks are age 65 or older, so we would want about 17 of those 142 Black respondents to be 65 or older. We then sample according to our normal rules until a bin is full, and after that, we don’t take any more respondents who fit in that bin.[3]


[1] Astrology aside, we have no substantiated reason to believe that adults born in different months have systematically different beliefs and opinions. For children, this becomes a bit more complex because birth month determines school entry year. If a school district’s cutoff date is September 1, a child born on August 31 would be a grade ahead of one born on September 2 and thus have a year more schooling, knowledge, and socializing experiences than his two-day-younger friend. Many arbitrary cutoffs like this exist, such as the entry into force of a treaty, a change in a law or regulation, or a natural or manmade disaster. Sometimes we can use these arbitrary cutoffs to study phenomena of interest right around the cutoff point; these are often done with regression discontinuity designs.

[2] Data from https://www.pewresearch.org/social-trends/fact-sheet/facts-about-the-us-black-population/; version of March 2, 2023, accessed Aug 18, 2023. They distinguish here between Black as a racial category and African-American, which implies citizenship or residency status. A non-negligible share, perhaps 10% of the Black population, is immigrant.

[3] This is why, if you’ve ever tried doing online surveys, they first ask about birth date and gender, and sometimes race. They are trying to make sure they haven’t already filled their quota of people like you.

Archives

No archives to show.

Categories

  • No categories

Site contents (c) Leanne C. Powner, 2012-2026.
Background graphic: filo / DigitalVision Vectors / Getty Images.
Cover graphic: Cambridge University Press.

Powered by WordPress / Academica WordPress Theme by WPZOOM