Home » Glossary

Glossary

This glossary contains not only the terms from the book itself but also a bunch of other methods terms you might run across. Items from the book are tagged with the chapter number(s) at the end of the definition. This was constructed by merging two separate glossaries that I’ve compiled, so the definitions listed here may differ slightly from those in ERW.

See another methods term you want defined? Feel free to use the Contact form on the About Me page to send it in.

Term Definition Ch
*.csv Comma separated file – generic spreadsheet file readable by all stats programs 8
*.do file  Stata file containing a list of commands to be executed in a single batch 9
Adjusted R2 Goodness of fit measure for multivariate OLS regression; adjusts baseline R2 to account for degrees of freedom lost to increased IVs; similar interpretation to R2 9
AJPS American Journal of Political Science  
Alternative hypothesis technical term for the hypothesis derived from our theory that we are testing, usually that some relationship exists. Contrasts with the null hypothesis, which generally argues that no relationship exists.  
Analytic narrative Tool for qualitative analysis of hypotheses derived from formal models; variable focused 5
ANES American National Elections Study  
Annotated bibliography Preliterature review tool combining complete citations for sources with a brief summary and commentary on each piece 4
APA American Psychological Association (citation format)  
APSR American Political Science Review  
Array MS Excel terminology for a range of cells, consisting of one or more columns and one or more rows. Specifies the range of cells on which a formula is to be applied or calculated.  
Assumption Part of a theory: claims or beliefs about how the world operate; underlying (set of) condition(s) that is/are necessary for a theory to hold. 2
Asymptotic tends to refer to properties of estimators, sampling distributions, etc., with infinitely large samples. In reality, we never have infinitely large samples, so we can only approximate asymptotic properties of these tools. (Not usually an issue of concern until you get to advanced OLS, MLE, and such.)  
Autocorrelation (serial) arises in time series data when each period’s value on a variable is best predicted by its value in the previous period. De-meaning the variable (removing the mean to expose the trend) is one strategy for early analysis; more sophisticated methods for dealing with trend-based data or multi-period trends (for example, an annual pattern of precipitation fluctuation or consumer spending) are available at higher levels of training.  
Background attribution Level of interview confidentiality generally allowing data from the interview to be cited to an interview provided that the individual speaker cannot be identified from the citation; direct quotes may or may not be permitted by the interviewee 7
Baseline model Initial or foundational specification to which refinements, additions, and adjustments are made 10
Bias comes in many forms. Measurement, estimator, selection are some of the big ones.  
Bibliographic management software Electronic tools that manage, collate, and format citations in manuscripts 4
Bibliography-hopping Literature discovery process in which the researcher uses items cited in prominent or recent work to locate other relevant pieces and authors further back in the scholarly chain 4
Big Data approach to (typically non-causal) research involving great quantities of automatically generated and rapidly proliferating data 5
Binomial distribution Distribution for 0-1 data A
BLUE Best Linear Unbiased Estimator, a descriptor of OLS regression when the Gauss-Markov assumptions are met. Best = narrowest possible sampling distribution, Linear = of the class of linear models, Unbiased = mean of sampling distribution is centered on the true mean of the population, Estimator = statistical tool  
Case control method Between-case tool for qualitative analysis of carefully matched cases; matching on key variables associated with alternate hypotheses allows logical exclusion of these variables as causes of outcomes 5
Causal homogeneity  fancy way of saying that the cases in a certain group or population share the same causal chain. Is an assumption for certain forms of quantitative analysis.  
Causal mechanism Part of a theory: the specific chain of events that leads from the independent variable to the dependent variable 2
Causal Process Observation (CPO) Process-tracing observation (within a case) that captures a sequence of events, phases, or characteristics at a given point within a larger “case” 6
CCES (CES) Cooperative (Congressional) Elections Study data project  
Census, population An exhaustive listing of all elements (individuals, cases, etc.) of the population 6
Central tendency a descriptive (univariate) statistic which varies by level of measurement and captures the location of a ‘typical’ observation for that variable A
Ceteris paribus assumption Assumption, often implicit, that all other variables are held at constant values when we consider the effect of changing one IVs value on the DV 8
Chi-squared test nonparametric statistical test that compares the distribution of two variables to determine whether the distribution of one variable is influenced by its value on the second variable. Test is symmetric, meaning which variable is A/rows and which is B/columns does not matter to the test. Theory is required to interpret. Test is computed using observed values in each cell minus the expected values, squaring, dividing by the expected value, then summing across all cells. This is compared to the chi-squared distribution for a p-value.  
Codebook Document accompanying a published or replication dataset explaining coding rules, variable values, and related information for that data; sometimes called documentation. (Alt: file of questions or data points required that indicates all possible values for each question. These include not applicable codes, skip patterns where some questions are irrelevant because they are follow-ups to an inapplicable question, options for “other,” etc. Exhaustive and complete codebooks help to produce valid and reliable data.) 8
Codes, coding Process of applying measurement rules to evidence to produce data 8
Colinearity Excessive correlation between IVs of a multivariate model; violates assumptions of regression and similar models; requires correction for values above about 0.6 9
Colloquialism Casual or informal term that is inappropriate in formal writing 11
Comparative statics Study of the effects of changing one variable from a given state or value to another; often used to refer to hypotheses derived from formal or similar models about the effects of changing variable values 6
Composite variables Variables created by combining and/or manipulating other variables’ values 9
Concept abstract representation of a phenomenon of interest. Uber-concepts (Leanne’s term) have multiple forms of basic concepts underneath them. Basic concepts are the workhorse of most theories. Per Goertz, concepts then have multiple constituent components. 2
Concept web Form of graphic organizer emphasizing nonlinear and multiple connections between concepts 2
Conditional hypothesis Claim that the effect of one IV is dependent (conditional) on the effect of another IV 3
Confounder a variable whose effect potentially interferes or competes with the effect of your hypothesized variable in explaining an outcome, usually by being correlated with both the IV and DV in your model. Confounders often enter a regression model as control variables; in the case of bivariate statistics, confounders must be addressed in other manners.  
Confounding variable Variables correlated with both the DV and IV of interest 9, 10
Consistency property of an estimator or other statistic that the mean of its sampling distribution is the same as the mean of the underlying population.  
Constant A variable whose value does not change across cases; OR another name for the intercept in regression 6
Constant (1) a characteristic of phenomena which does not vary across the instances of the phenomenon under consideration  
Constant (2) the value of b0, the intercept of a regression model or other statistical estimator  
Content analysis Family of techniques for examining patterns in written, spoken, or visual corpora 5
Control variables variables in a multivariate analysis that we believe affect the value of our outcome variable. Control variables are necessary to ensure that variation in Y is attributed to variation in the correct X.  
Control variables (CVs) Variables that we believe influence the DV along with our IV(s) of interest; must be accounted for in any qualitative or quantitative analysis to obtain accurate results 2, 6, 7
Corpus formal term for a body of text we want to analyze (pl: corpora) 5
Correlation a bivariate measure of association. Atheoretical. Tells us only how close the points fall to some imaginary line of best fit, not how steep the line is.  
Corresponding author Author designated in a coauthored piece as the contact point for inquiries 8
Counter-examples Single or rare cases that do not fit the hypothesis; possibly defined as outliers in some contexts 3
Counter-explanations Hypotheses that explain adverse findings 3
Counterfactual Thought experiments used in qualitative research to consider the potential outcomes of (unobserved) variable values or combinations. Good counterfactuals manipulate one variable at a time; as such, they are sometimes called ‘thought experiments.’ 7
Country-year Dataset structure where observations are each country in each year; common in comparative and international politics research 8
COW Correlates of War data project  
CPO See Causal Process Observation  
Critical value Cutoff point in the distribution of a test statistic such as χ2 or t to determine statistical significance 8
Cronbach’s alpha (α) Statistic indicating how closely the components of a composite indicator capture the same underlying concept; higher values indicate better internal consistency, with a commonly accepted  cutoff around 0.7 9
Cross-sectional data structure of one observation of many units  
Cross-tabulation Bivariate analytical technique for nominal or ordinal level IVs and DVs; indicator of statistical significance is usually χ2 5
CSES Comparative Study of Electoral Systems data project  
CV See Control Variable 6,7,8
DA-RT Data Acess and Research Transparency initiatives 8
Data Intentionally gathered information, usually reflecting values of variables 7
Data (set) structure Spatial and temporal dimensions of the data. Common data set structures are cross-sectional, time series, and panel (time-series cross-sectional or TSCS).  
Data archive Centralized storage facility for collected data; best known are ICPSR and Harvard Dataverse 8
Data cleaning Process of ensuring that data is correctly coded, missing data is appropriately handled, and other possible errors are detected and addressed 9
Dataset Collection of intentionally gathered information (data) normally consisting of one value for each variable for each observation 7
Dataset observation A standard qualitative or quantitative observation establishing one and only one value for every variable for each case in the dataset 6
Deductive theorizing Researcher derives hypotheses from generalized analysis outside the context of any specific case 2
Degrees of freedom  Roughly speaking, the number of independent pieces of information available for analysis after various calculations constrain the data 6,9
Dependent variable The outcome variable in a hypothesis; known as the “left-hand side (LHS)” variable due to its placement in a statistical model 3
Descriptive statistics Summarize and describe characteristics of data (mean, median, mode, standard deviation, etc.) without reference or generalization beyond the available data itself. Univariate: Calculated for individual variables. May also include calculations of skew for variables at levels of measurement where this is an appropriate concept. 8, A
Deterministic hypothesis Class of hypotheses arguing that a particular relationship should hold across all cases; includes primarily hypotheses of necessity, sufficiency, and necessity and sufficiency 3
Deviation A value minus the mean of that variable; applicable to interval-ratio data only A, B
Diagnostics (of a model) Crucial information about a statistics model included at the bottom of a table; always includes N and an appropriate measure of goodness of fit 10
Dichotomous Describes an indicator or variable that takes on only two possible values; usually framed as a “yes/no” item; sometimes called “dummy” variables 3
Dichotomous a variable having two values, namely 0-1. Sometimes called a dummy variable.  
Difference in means See t test B
Direct relationship A positive association between two variables – they both increase or decrease together 3
Directed dyad pair of states where some flow of action is directional: USàUK, UKà. Directed dyads always come in pairs.  
Directional hypothesis Predicts increases or decreases in a DV as a function of increases in one or more IVs 3
Dispersion Measure that indicates the ‘spread’ of a distribution; varies by level of measurement 7
Dispersion a descriptive (univariate) statistic which varies by level of measurement and captures the spread of the observations around the central tendency  
Distribution Graphical aggregation of data points A
Document analysis Qualitative analytical technique: systematic review of primary, secondary, and occassionally tertiary sources 7
Document Online Identifier (DOI) Stable, permanent  web address for research papers, datasets, and other scholarly publications 4
DOI See Document Online Identifier  
Domain Spatial and temporal scope of a theory or study 2,8
Double-blind Form of peer review in which neither the author(s) nor the reviewer(s) know the other’s identity 4
DSO See Dataset Observation  
Dummy variable See dichotomous variable  
Dummy variable a binary (0-1) variable that is used as an indicator of membership in a particular category. It codes as a 1 if the case is part of that category. Ex: the United States would get a 1 for the geographic region “North America.” Ordinarily, these categories must be mutually exclusive and exhaustive, and we must omit one value of a set of dummies (say, geographic regions) as a reference category.  
DV Dependent variable 2
Dyad Pair of states or other actors (used as a unit of analysis) 2, 8
e abbreviation for the error term, or the amount by which a curve fit through the data ‘misses’ the point in question. Points above a line have a positive e; points below the line have a negative e. Calculated as y(observed) – y-hat  
Efficiency property of an estimator, referring to how skinny (tightly clustered) its sampling distribution is around its mean. Skinny is better because it means a better estimate of the mean, BUT if the mean is biased, we’ve got a good estimate of the wrong answer.   
Elite interviewing Interviews conducted with individuals chosen because of positions they occupy, rather than as representatives of some larger class; individuals need not be high profile to qualify as elites 7
Empirical Type of research question: asks about how the world is, based on observable data and tested by scientific observation or experiment 1
Empirical Rule Characteristic of the normal distribution where the distribution of observations conforms to 68% of the observations within +/- 1 standard deviation of the mean, 99.4% within +/- 2 standard deviations of the mean, and 99.7% within +/- 3 standard deviations of the mean. A
Endogeneity Problematic circumstance where IVs cause one another; requires deployment of appropriate fixes to estimate relationships. Instrumental variable and two-stage least squares (2SLS) strategies are the usual solutions. 9
Endogenous coming from within the system under study  
Episodic record Qualitative data source: reports produced on an inconsistent basis, sporadically, or one time only 7
EPSEM sample selection procedure: Equal Probability of Selection Mechanism A
Equifinality Property of the social world where many causal routes to a single outcome exist 5
Estimator a particular tool for statistical analysis which estimate relationships in the data. Regression, logit, and probit are examples of estimators, but thousands exist. (Descriptive statistics are not estimators because they contain no uncertainty.)  
Eta (H) Greek letter referring to the proportional reduction in error (PRE) test.  
Exogenous coming from outside of the system under study  
Factual (procedural) Type of research question: identifies the basic facts of a situation 1
Falsifiability Characteristic of a theory: We can identify observable implications that would occur if the theory were incorrect 1
Falsifiers Specific pieces of evidence that would falsify a theory; the research would expect to find these if the theory were incorrect 3
FE See Fixed effects  
Fertility Characteristic of a theory: suggests other observable implications or novel hypotheses 1
Field experiment Quasi-experiment conducted in the real world, outside a controlled laboratory setting. Some variables are manipulated, usually those in the treatment, but others are usually handled via randomization (i.e., who in a household is interviewed) or through some form of purposive sampling (i.e., which neighborhoods we sample for the study).  
Finding aid Guides to special or archival collections detailing temporal and substantive scope, material origins, etc. 7
Fixed effects Systematic variation across units in a study that is correlated with both DV and IV of interest or battery of dummy variables included in a study to capture fixed effects 9
Formal model Form of deductive reasoning that uses game theory and similar tools to depict and analyze situations in the abstract 5
FSA See Fuzzy Set Analysis 5
Fully attributed Level of interview confidentiality allowing direct quotes and identification of speaker by name; rarely used in social science 7
Fundamental Problem of Causal Inference To determine causation with certainty, we need to know what outcome would have occurred under alternate value(s) of the variable(s) of interest. This is impossible to achieve in the social scientific context because we cannot conduct pure experiments like the natural sciences can.  
Fuzzy Set Analysis (FSA) Similar to Qualitative Comparative Analysis; analyzes data using Boolean logic where elements of the sets can have degrees of membership (i.e., cases can be evaluated as having or being more or less of some characteristic);  5
Gamma statistic Test statistic for strength and direction of association between ordinal variables 5
Gauss-Markov Conditions/Assumptions set of complex assumptions about OLS regression that must be met for it to be the best linear unbiased estimator (BLUE). Depending on who’s counting, may appear as anywhere between 4 and 9 distinct assumptions. Mostly, the model is linear in the b’s (no b^2 or higher order terms, but X’s can be ^2+); X’s cannot be overly correlated with each other or a linear sum of 1; Y cannot cause any X; and the error terms must be uncorrelated and randomly distributed (which is generally met if the Xs are not overly correlated).  
GDP Gross domestic product; an indicator of national economic production  
GDPPC Gross domestic product per capita; an indicator of relative wealth across countries  
General Social Survey Large-scale study of the American population conducted yearly (1972–1994) or in alternating years (1996-present) 8
GOF See Goodness of fit (statistic)  
Goodness of Fit statistic Summary value computed by models to indicate how well they explain the data; varies by model type, with OLS and variants using R2 or Adjusted R2 and MLE based models using log likelihood. Measures of correctly predicted cases (for ordinal and nominal outcomes) are another type of goodness of fit statistic, but are less common and not automatically reported with statistical output. 10, 11
Graphic organizer Visual depiction of information and relationships 8
GSS General Social Survey 8
GUI Graphical user interface; the point-and-click menus in a software platform 6
Hard test A context in which a theory is least likely to be successful 6
HARKing Hypothesizing after results known – ethically questionable practice in deductive empirical research  3
HDI Human Development Index; a cross-nationally comparable measure of development that includes a mix of non-economic indicators  
Hypothesis A statement of the relationship that the researcher expects to find between her dependent variable and independent variable(s); usually phrased in terms of indicators 3
Hypothesis an operationalized version of a theory, indicating the specific direction and type of relationship between two (or more) indicators  
Hypothesis notation Shorthand form denoting independent & dependent variables, direction of change for each, and a causal direction arrow 3
Hypothetical Type of research question: tries to predict what might happen 1
ICPSR Inter-University Consortium for Political and Social Research 8
ILL Interlibrary loan 4
Impact factor (IF)  a measure of journal influence in the field, usually calculated as the average number of times an article from that year’s issue is cited in other journals in the following year (sum of articles / sum of citations). In political science and international relations, values above 2-2.5 are considered respectable. Depending on the year and the specific calculation method (several measures of ‘impact’ exist), the top journals in political science have IFs between 5 and 7.  
Independent variable The causal or explanatory variable in a hypothesis; known together with control variables as “right-hand side (RHS) variables” due to their placement in a statistical model 3
Index Tool of data reduction: combines multiple indicators into a single measure, often by averaging (pl: indices) 9
Indicator Observable characteristic; measurable version of a concept 3
Inductive theorizing Researcher identifies potential explanations from one or a few cases, then generalizes to other cases 2
Inferential statistics Branch of statistics that generalizes from samples to populations 8, A
Instrumental variable One common “solution” to endogeneity problems; a variable correlated with one of the problem variables is used as a synonym of sorts; occasionally called 2 Stage Least Squares (2SLS) 9
Interaction term Composite variable generated by multiplying two or more component variables together; used to test conditional hypotheses.  Cannot be entered or interpreted in a statistical model without all constituent variables being included in the model & calculations. 9
Interactive hypothesis See Conditional hypothesis 3
Inter-coder reliability ratings Measure calculated on data where multiple coders review a given case; indicates how well the coders agree on the values of the variables; sometimes called inter-reader reliability rating 8
Interlibrary loan (ILL) System allowing researchers to request items not owned by their own library from other sources 4
Intermediate-N designs Techniques for analysis of between approximately 30–50 cases 5
Interquartile range a measure of dispersion for interval/ratio data, consisting of the data from 25th to 75th percentile (the range occupied by the middle 50 percent of the data)  
Interval level of measurement where units are constant across the full range of values; must have a unit attached to define the interval (dollars, deaths, years, etc.). Has no absolute zero; negative values are both possible and have meaning. Usually pooled with ratio-level variables for analysis.  
Interval-ratio Level of measurement: continuous or discrete quantities with consistent units attached such as years or votes; most precise level used in political science 3,5,7, A
Inverse relationship A negative association between two  variables—one increases as the other decreases 3
IRB Institutional Review Board (human subjects board), responsible for ensuring compliance with human subjects research rules 7
IV Independent variable 2
Jitter term used to describe placement of dots or indicators in a figure, where instead of piling dots on top of one another indistinguishably, the figure creator ‘jitters’ the dots by adding or subtracting a very small amount from the value so that the dots spread out and emphasize the clustering there in a way that piling on does not.  
Journal Storage Project (JSTOR) Massive database of full-text journals in many disciplines, with extensive historical holding 4
k shorthand for the number of right-hand-side (RHS) variables  
Laboratory experiment Social science experiment conducted in a controlled setting, where more values are manipulable. May involve computer-facilitated or mediated interaction, peer-to-peer play, or many other variants. Lab experiments using students or college-town residents are generally seen as unrepresentative.  
Lagged variables Variables observed in a period prior to the period in analysis—for example, GDP from a prior year 9
Law of Large Numbers As sample size grows, the mean of the sample will converge on the mean of the population. Combined with the Central Limit Theorem (not defined here), we observe convergence of sampling distributions for means, regression coefficients, and many other statistics.  
Level of measurement Degree of precision used in the operationalization of a variable; most common in political science are interval-ratio, ordinal, and nominal 3, A
LHS left-hand-side, short form for the dependent or outcome variable(s) in a statistical model. Any single statistical model can only have one outcome variable. To work with multiple outcome/dependent variables at once, you need to use systems of equations or other more sophisticated models that we do not cover.  
Likert scale symmetric scale developed by psychologist Rensis Likert in 1932 (as part of his PhD dissertation) to measure attitudes and preferences along a scale. It produces an ordinal level variable, often of the form “strongly disagree, disagree, neither agree nor disagree, agree, strongly agree.” The inclusion of a middle category has substantive and methodological implications; the Wikipedia page is informative as a place to start. 4
Linear probability model Uses regression to analyze data with dichotomous DVs; problematic because it violates several key regression assumptions, but usually viable for student work 5
Literature, scholarly  Body of research about a research question or research theme 4
Log transformation Logarithmic transformation of a variable exhibiting significant skew or exponential distribution; log value of x is the value to which the mathematical constant e must be raised to obtain x 9
Logit Common tool, with probit, for quantitative analysis of dichotomous DV data; test statistic is significance of coefficients though interpretation of coefficients is more complex than in regression 5
Main diagonal Set of cells in a square table or matrix running from the top left corner to the bottom right corner; typically corresponds to cases where X = Y 3
Marginal (of a table) Row and column at the edges (margins) of a table indicating total for that row or column; must balance across and down A
Marginal effect Change in DV value resulting from changing IV value(s) in the manner specified; appropriate measure of effect size for OLS and other multivariate estimators 10
Mean Arithmetic average; measure of central tendency for interval-ratio variables A
Measurement Process by which information is converted into systematized values of variables that are comparable across observations 7
Median Value above and below which 50% of the values in a distribution lie; measure of central tendency for ordinal variables  A
Methodology The study of research design and study (and creation) of new analysis techniques; specialists in this field are methodologists 5
Mixed methods design research designs combining elements of qualitative and quantitative validation of different observable implications for the same research question 5
MLA Modern Language Association (citaiton format)  
MLE Maximum likelihood estimation 9
Mode Value that occurs the most frequently in a distribution; measure of central tendency for nominal variables. A single variable may have more than one modal value if more than one category is tied for the highest frequency. A
Monograph Scholarly book or other extended work, usually written by a single author 4
Multicollinearity See Colinearity 8
Multinomial logit Quantitative tool for analyzing DVs with three or more nominal categories; functionally identical to polychotomous probit 5
N Number of observations  
Natural experiment Situation where cases assigned to experimental and control conditions are determined by nature or exogenous forces, but outcome is arguably random or very close to it 6
Necessary condition Asserts that some cause X is required for the outcome Y to occur; implies that Y cannot occur in the absence of X 3
Negative cases cases where the outcome of interest is *not* observed. Necessary to include in a dataset because we cannot explain a constant (having all cases exhibit the outcome of interest) with a variable, but sometimes hard to identify because of selection effects.  
Negative evidence Cases whose significance to hypothesis testing is the absence of the phenomenon of interest 6
NES See ANES  
No relationship, hypothesis of Claim that one or more IVs has no (usually statistically) discernible effect on another; implies that the coefficient is not statistically significant 3
Noise random (stochastic) error in a model or even in the measurement of a variable that causes the reported value to deviate from the true value. Imagine trying to measure rainfall near you. Two collection containers placed near one another could interfere with the wind patterns affecting collected rain amounts, as could buildings or other features. Even when placing the containers out in the middle of the street, small variations in the recorded amounts would occur because our measurement devices are not very precise.  
Nominal Level of measurement: unrankable but discrete categories, with no implied direction or magnitude; lowest precision of measurement.  Categories must be exhaustive and mutually exclusive. 3, A
Normal distribution Classic bell-shaped distribution for continuous variables; has special known properties for probability under the curve, sampling distribution of mean, and more. Conforms to the Empirical Rule. A
Normative Type of research question: focuses on what should happen 1
Not-for-attribution Level of interview confidentiality not allowing any direct reference to an interview (or interview subject) as the source; all information gathered at this level must be triangulated 7
Null hypothesis in quantitative scholarship, generally a claim that the expected relationship is 0 (i.e., that a null relationship exists). If the hypothesis is one of no relationship, then the null is that a relationship exists (generally, without a specified direction, because any direction of hypothesis would invalidate the alternative hypothesis under examination).  
Observable implications Empirical patterns that should emerge if a theory is correct 3
Observation A single instance of the phenomenon under investigation 7, 8
OLS Ordinary least squares regression; see Regression 5
Omitted variable bias Incorrect estimates of relationships (qualitative or quantitative) resulting from failure to consider a relevant variable 9
Ontology Beliefs about the nature of being: in social sciences, how the world is constituted 2
Open access publication format providing ungated access to research reports in peer-reviewed publications 3
Operationalization Process of identifying a valid observable indicator for the concepts expressed in a theory 5, 8
Ordered logit Quantitative tool for analyzing DVs with three or more ordinal categories; functionally identical to ordered probit 5
Ordered probit Quantitative tool for analyzing DVs with three or more ordinal categories; functionally identical to ordered logit 5
Ordinal Level of measurement: rankable categories, where the intervals between categories may or may not be equal or precisely definable; intermediate level of measurement. The values are not mathematically manipulable. 5, 7, A
Ordinary Least Squares regression See Regression 5
Outlier  An observation that is decidedly outside the usual range for the variable, particularly conditional on relevant covariates (i.e., a country with a very high GDP but a very small population, or a person with very high education and very low income)  
OVB See Omitted Variable Bias 9
Panel data Multiple units observed at multiple points in time; also called time series cross section data 8
Parameter a value calculated from a population  
Parsimony Characteristic of a theory: explains more while using less 1
Passive voice Grammatical structure in which sentence’s actor is the object of an action rather than the instigator of it 11
Peer- review process Procedure by which scholars evaluate each other’s research for rigor and completeness prior to publishing; acts as a gatekeeping device for professional publications; typically double-blind 3
Poisson distribution count distribution for discrete numbers (no decimals or fractions allowed) with a mode at 1, a very high number of zeros, and a rapidly decreasing number of observations of 2, 3, 4 or higher values.  
Polychotomous Describes an indicator or variable that takes on three or more unordered values (i.e., is at the nominal level of measurement) 5
Polychotomous probit Quantitative tool for analyzing DVs with three or more nominal categories; functionally identical to multinomial logit 5
Pool(ed) (1) describes the process of combining values of a variable or sets of cases. This might occur to group similar cases or values together, such as pooling “agree” and “strongly agree” responses in a Likert scale, or pooling all countries with GDP within a certain band into a “middle income countries” group for analysis.  
Pool(ing) (2) Pooling refers to situations or processes where multiple paths lead to the same outcome, such that the outcomes become indistinguishable. Students who fail a class may appear to pool together, though their routes to failing may be very different. Sometimes our research questions focus on or require disaggregating pooled situations); other times, pooling is acceptable.  
Pooling Process of combining or collapsing value categories in a distribution; loses information but may increase analytical traction B
Population The complete set of all cases relevant to a theory 6, 8
Possibility principle Guideline for choosing negative cases for qualitative analysis: at least one IV takes a value that theory claims is crucial for the DV to occur, and no IVs predict against the outcome of interest 6
Prediction the core of a theory, stating the expected relationship between the concepts at the heart of the theory. May be a process prediction or an outcome prediction.  
Predictiveness Characteristic of a theory: theory helps explain cases other than those from which it was derived 1
Preprint working paper/research report that is in draft form and not yet published or peer-reviewed 4
Preregistration process of publicly filing hypotheses, data collection, & data analysis plan before analysis to prevent HARKing 3
Pretesting  Evaluating data collection or coding instruments against a small sample of cases or sources prior to beginning full-scale data collection 8
Primary source Qualitative data source characteristic: no analysis separates the researcher from the source’s creator 7
Probabilistic hypothesis Class of hypotheses arguing that a relationship holds across a pool of cases even though individual cases may not support the relationship; four main types are directional, relative, no relationship, and conditional (interactive). 3
Probit Common tool, with logit, for quantitative analysis of dichotomous DV data; test statistic is significance of coefficients though interpretation of coefficients is more complex than in regression 5
Process tracing Within-case qualitative approach to analysis that explores processes in social interaction. Hypotheses or expectations are usually about the order or nature of steps in the process rather than about outcomes per se. World War I had only one outcome, and technically only one ‘cause’ (who shot first), but it had many steps along the way where we could form expectations about the ways in which, for example, breakdowns in crisis communication increased the chances of the July crisis turning into a violent conflict. 5
Proportional Reduction in Error (PRE) nonparametric test (known as eta, or H, in Greek) that compares the rate of accurate outcome predictions from a theory to a naïve prediction of the modal value.  
Proximate cause Immediate and direct trigger of outcome of interest 5
Purposive sample A deliberately selected subset of cases chosen for their values on particular key variables 6
QCA See Qualitative Comparative Analysis 5
Qualitative Comparative Analysis (QCA) Analytical technique for qualitative data using Boolean logic to test propositions of necessity and/or sufficiency; requires variables to be fully in or out of a set (i.e., all variables must be dichotomous) 5
R&R See Revise and Resubmit 11
R2 Goodness of fit measure for bivariate OLS regression; interprets as percentage of DV variation explained by variations in IV 9, 10, B
Random sample cases are selected using a random number generator or other process from a known and fixed population.  
Range Measure of dispersion for ordinal variables A
Ratio like interval variables, a level of measurement where the units are constant across the full range of values; must have a unit attached to define the distance between points (dollars, deaths, years, etc.) Has an absolute zero; negative values neither make sense nor are defined (e.g., battle deaths).  
Recoding Applying a new scale or coding scheme to existing data to change its level of measurement or other characteristics 9
Regression Workhorse tool of quantitative analysis; requires interval-ratio level DV but IVs may be any level of analysis; test statistic is significance of coefficients 5
Relative hypothesis Claim comparing the magnitude of effect of two or more independent variables 3
Reliability Characteristic of a measurement tool: produces values that are consistent across cases and applications 7
Replicability Characteristic of research: sufficient transparency in research practices and reporting to allow another researcher to recreate our analysis 1
Research program Set of related research questions often drawing on the same concepts and theories 4
Research question A bounded statement of the phenomenon under investigation, usually focused on explaining variation in an outcome 1
Research topic An unbounded statement of the phenomenon of interest to the researcher 1
Residual The leftover, unsystematic parts of an observed value; calculated as observed value minus expected value in regression. In OLS, the residuals always sum to 0 (the mean creates an even distance of points above and below), which is why we normally square them before analyzing. B
Reverse outline process of identifying paragraph topic sentences and compiling them to see if the sequence of ideas is effective 11
Revise and resubmit Journal or other publication decision returning a paper to its author for significant revision and often re-review as a precursor to acceptance for publication 11
Rhetorical questions Questions posed by authors with no intention that the reader will respond; cheap transition device worth avoiding 11
RHS right-hand-side, short form for all of the independent, control and other variables entered into a statistical model (including things like fixed effects, etc.): Any and all variables that are not the outcome(s) under study.  
Robustness checks Additional model specifications estimated using alternate indicators of key concepts to determine that findings hold across different operationalizations of those concepts 8, 10
Running record Qualitative data source: reports produced on a systematic and recurrent basis: hourly, daily, annually, etc. 6
Sample Subset of the population obtained/used for analysis 8
Sampling distribution Distribution of some statstic, such as a mean, calculated on repeated samples from a population; some have special properties. The sampling distribution for means and regression coefficients converges to the normal distribution, no matter the shape of the initial distribution, allowing us to use what we know about the normal distribution’s confidence intervals to make inferences. A, B
Scale Tool of data reduction: combines multiple indicators into a single measure, usually by summation; see also Index 9
Scooping Exploiting someone else’s data by publishing analysis based on it before the collector is able to do so; considered very inappropriate professional behavior 8
Scope condition Part of a theory: defines the domain of the theory or any other restrictions or boundaries on what cases the theory should be expected to explain 2
Secondary source Qualitative data source characteristic: one layer of analysis separates the researcher from the source’s creator 7
Selection bias The result of analyzing data that suffer from a selection effect 6, 8
Selection effect Natural or man-made processes produce an observed sample that is a biased subset of the underlying population; all cases do not have an equal effect of entering the observed sample 6, 8
Sensitivity tests See Robustness check 11
SFC See Structured Focused Comparison 5
Significance In the statistical sense, an indication that the observed relationship is not 0 9, A, B
Simultaneity Special case of endogeneity where the DV causes one or more IVs; requires deployment of appropriate fixes to estimate relationships. (This violates an important assumption of the Gauss-Markov conditions, which tell us then OLS is BLUE (Best Linear Unbiased Estimator).) 9
Simultaneity bias Incorrect coefficients (or qualitative relationship estimates) obtained from neglecting to consider the effect of Y on X as well as the effect of X on Y 9
Skew Characteristic of continuous distributions; concerned with distance between mean and median values 8, A
Snowball sample Sampling procedure in which interviewees are asked to name other relevant individuals to interview, who are then interviewed and asked to name other individuals, etc. 7
Soaking and poking Inductive qualitative research technique involving deep immersion in a social context 6
Social Sciences Citation Index (SSCI) Tool for discovering work citing central or prominent articles, forward in time from the starting piece; sometimes called Web of Science 4
Spearman’s rho Test statistic for strength and direction of association between ordinal variables 5
Special collections Library resources on particular topics, often including archival and nontextual material; special collections typically do not circulate 7
Specification Particular combination of variables included in a statistical model 9
Square (table) Has an equal number of rows and columns B
SSCI Social Sciences Citation Index 4
Standard deviation Measure of dispersion for interval-ratio variables. Ccalculated from the sum of squared deviations divided by the number of data points included, then take the square root. (Without the square root, you have calculated the variance, which is much harder to interpret because it is squared,  meaning we have to interpret it in two dimensions.) A
Standard error  the measure of dispersion for a sampling distribution, calculated in a manner parallel to the standard deviation on a population.  
Statistic a value calculated from a sample of two or more data points  
Statistical significance loaded term with a lot of nuance, but the short form is that a significant value (coefficient, correlation, difference of means, etc.) has been determined to be “not zero” with the confidence level specified in the confidence interval. The standard level in the social sciences is a 95% confidence level, meaning that up to 5% of the time we might be wrong and the true relationship is zero.  
Statstical significance See Significance  
Structured Focused Comparison Between-case tool for analysis of qualitative data using an implicit regression model; variable focused and usually uses paired cases 5
Sufficient condition Asserts that some cause X always leads to the occurrence of Y; the absence of X may or may not result in Y 3
Survey experiment experiment embedded in a survey where randomized groups of respondents are assigned to different variants of a question. Differences in responses can be attributed to differences in questions as a result of randomization.  
t test Test statstic for difference of means; compares means of one group to an external referent (one-sample), or two groups/values to each other (two-sample independent or paired, respectively) B
TADA See Text as Data  
Tail(s) The trailing ends of a distribution, at the extremes away from the central tendency A
Tails (of a distribution) the long skinny ends of a distribution plot. The tails are asymptotic, meaning they approach but never touch zero even at the most extreme values. Thus, some (miniscule) probability always remains beyond any critical test value we establish that the true value is 0. As a result, we cannot ever “prove” anything with statistics.  
Tertiary source Qualitative data source characteristic: two or more layers of analysis separate the researcher from the source’s creator(s) 7
Text as data (TADA) Use of large-scale text corpora as data sources  
Text mining Process of combing texts to count items or identify passages for content analysis 7
Theory A reasoned speculation of the answer to a research question, usually phrased in terms of concepts; includes the expectation, a causal mechanism, assumptions, and scope conditions 2
Theory expression of expected relationship between two or more concepts, containing a prediction, a causal mechanism, scope conditions, and assumptions.  
Theory family Cluster of related answers to a research question; typically a subset of a scholarly literature 4
Time-series data structure of multiple observations of a single unit  
Time-series cross-sectional (TSCS) data structure of multiple observations of both multiple units at multiple points in time. Sometimes called ‘panel’ data.  
Transformation Mathematical altering of the scale of a variable to create a more linear relationship 8, 9, B
Transpose command that flips a data set’s rows and column so that rows become columns and columns become rows. Imagine putting a push pin in the top right corner of your data set, then taking the bottom left corner and making it the top right corner.  
Triangulation Reinforcing conclusions drawn from (primarily) qualitative data by deploying findings or evidence from other types of data or sources 7
Truncate(d) a distribution is truncated if a natural or human-imposed process artificially restricts the range of values observed. A distribution is right-truncated if the highest values are cut off and left-truncated if the lowest values are cut off. Center truncation is possible, such that only the highest and lowest values are observed, but this is unlikely to occur in practice.  
t-test test for significance of a difference of means, either between the means of two samples or between a sample mean and a (known) population value. Uses the Student’s T distribution (slightly fatter tails than a normal distribution, up to about 100 observations) to calculate the probability of the difference being 0.  
Uber-concept a covering concept that contains multiple basic concepts in their own right. Power, regime type, and conflict are examples. (Term is of Leanne’s origin)  
Underlying causes Characteristics of the environment that facilitate or contribute to the occurrence of some outcome of interest 5
Uniform distribution Distribution for discrete data where each value is equally probable A
Uniform distribution distribution for discrete or continuous variables where all variable values are equally likely (the probability density function (PDF) is a horizontal line). In practice, most uniform distributions are discrete values like the value of a neutrally-rolled unweighted dice, where each value is equally likely to be observed over a large number of trials.  
Unit of analysis The item that constitutes one observation in a quantitative study: decision, individual opinion, country, dyad, state-year, etc. 2, 8
Univariate Family of statstics, like descriptive statistics, calculated on a single variable A
Validity Characteristic of an indicator: the indicator captures the concept of interest and nothing else 3, 5, 7, 8
Variable (1) a characteristic of a phenomenon whose values can vary across instances of the phenomenon  
Variable (2) In quantitative analysis, name of the column in your dataset; shortest and most abbreviated representation of the data contents (Example in Table 8.1) 9, A
Variable label Longest form of variable description, used in tables and graphs (Example in Table 9.1) 9
Variable name Form of data content referenced in textual discussion and listed in results table; uses words instead of abbreviations (Example in Table 9.1) 9
Variance two-dimensional measure of dispersion for an interval-ratio variable; what you get in the standard deviation process before you take the square root. Difficult to interpret so standard deviation is often preferred.  
V-Dem Varieties of Democracy data project  
Venn diagram Graphical organizer used to show overlap between sets 2
x-bar mathematical notation for the mean of a distribution  
Z score Calculated as the number of standard deviations from the mean A

Archives

No archives to show.

Categories

  • No categories

Site contents (c) Leanne C. Powner, 2012-2026.
Background graphic: filo / DigitalVision Vectors / Getty Images.
Cover graphic: Cambridge University Press.

Powered by WordPress / Academica WordPress Theme by WPZOOM