Home » 09 Preparing to Analyze Quantitative Data

09 Preparing to Analyze Quantitative Data

Home » 09 Preparing to Analyze Quantitative Data

Summary

Analyzing data requires more than simply running one model and reporting the results. Getting trustworthy results requires careful checking of the data, including checking for and addressing missing values, nonlinearities, and collinearity, and generating any necessary composite or recoded variables such as interaction terms, scales or indices, or even lagged variables. Finally, problems occur when messy real-world data meet assumption-laden models. Endogeneity, simultaneity, omitted variable bias, fixed effects, and dichotomous DVs all violate key assumptions of the OLS model and require appropriate statistical and/or theoretical adjustments to produce trustworthy results.

Articles

Web Extra: Lagged Variables
Web Extra: Omitted Variable Bias and Post-Estimation Plots
Web Extra: A Brief Introduction to Probit and Logit
Web Extra: Favorite Stats Resource Sites
Working with Survey & Experimental Data
Nonlinear relationships in theory and data

Vocab Flashcards

[qdeck random=”true”]

[q] Stata file containing a list of commands to be executed in a single batch

[a] *.do file

[q] Goodness of fit measure for multivariate OLS regression; adjusts baseline R2 to account for degrees of freedom lost to increased IVs; similar interpretation to R2

[a] Adjusted R2

[q] Excessive correlation between IVs of a multivariate model; violates assumptions of regression and similar models; requires correction for values above about 0.6

[a] Colinearity

[q] Variables created by combining and/or manipulating other variables’ values

[a] Composite variables

[q] Statistic indicating how closely the components of a composite indicator capture the same underlying concept; higher values indicate better internal consistency, with a commonly accepted  cutoff around 0.7

[a] Cronbach’s alpha (α)

[q] Process of ensuring that data is correctly coded, missing data is appropriately handled, and other possible errors are detected and addressed

[a] Data cleaning

[q] Problematic circumstance where IVs cause one another; requires deployment of appropriate fixes to estimate relationships

[a] Endogeneity

[q] Systematic variation across units in a study that is correlated with both DV and IV of interest or battery of dummy variables included in a study to capture fixed effects

[a] Fixed effects

[q] Tool of data reduction: combines multiple indicators into a single measure, often by averaging

[a] Index (pl: indices)

[q] One common “solution” to endogeneity problems; a variable correlated with one of the problem variables is used as a synonym of sorts; occasionally called 2 Stage Least Squares (2SLS)

[a] Instrumental variable

[q] Composite variable generated by multiplying two or more component variables together; used to test conditional hypotheses

[a] Interaction term

[q] Variables observed in a period prior to the period in analysis—for example, GDP from a prior year

[a] Lagged variables

[q] Logarithmic transformation of a variable exhibiting significant skew or exponential distribution; log value of x is the value to which the mathematical constant e must be raised to obtain x

[a] Log transformation

[q] Incorrect estimates of relationships (qualitative or quantitative) resulting from failure to consider a relevant variable

[a] Omitted variable bias (OVB)

[q] Applying a new scale or coding scheme to existing data to change its level of measurement or other characteristics

[a] Recoding

[q] Tool of data reduction: combines multiple indicators into a single measure, usually by summation

[a] Scale or Index

[q] Special case of endogeneity where the DV causes one or more IVs; requires deployment of appropriate fixes to estimate relationships

[a] Simultaneity

[q] Incorrect coefficients (or qualitative relationship estimates) obtained from neglecting to consider the effect of Y on X as well as the effect of X on Y

[a] Simultaneity bias

[q] Particular combination of variables included in a statistical model

[a] Specification

[q] Longest form of variable description, used in tables and graphs

[a] Variable label

[q] Form of data content referenced in textual discussion and listed in results table; uses words instead of abbreviations

[a] Variable name

[q] Generic data file type openable by many plattforms

[a] *.csv

[q] Roughly speaking, the number of independent pieces of information available for analysis after various calculations constrain the data

[a] Degrees of freedom

[q] Mathematical altering of the scale of a variable to create a more linear relationship

[a] Transformation

[q] Variables correlated with both the DV and IV of interest

[a] Confounding variable

[q] Goodness of fit measure for bivariate OLS regression; interprets as percentage of DV variation explained by variations in IV

[a] R2

[q] In quantitative analysis, name of the column in your dataset; shortest and most abbreviated representation of the data contents

[a] Variable

[q] In the statistical sense, an indication that the observed relationship is not 0

[a] Significance

[/qdeck]

Review Quiz

[qwiz random=”true” random_mc=”true”]

[q] Variables that include observed values from the prior period are known as _____.

[c]IGNvbnRyb2wgdmFyaWFibGVz[Qq]

[c]IGR1bW15IHZhcmlhYmxlcw==[Qq]

[c]IGxhZ2dlZCB2 YXJpYWJsZXM=[Qq]

[c]IHJvYnVzdG5lc3MgY2hlY2tz[Qq]

[c]IGluZmxhdGVkIHZhcmlhYmxlcw==

Cg==[Qq]

[q random_mc=”false”] Depending on your theory and data issues, you might transform a variable by _____.

[c]IHRha2luZyBpdHMgbmF0dXJhbCBsb2c=[Qq]

[c]c3F1YXJpbmcgaXQ=[Qq]

[c]IGRlLW1lYW5pbmcgaXQgKHN1YnRyYWN0aW5nIHRoZSBtZWFuIGZyb20gZWFjaCBvYnNlcnZhdGlvbiBhbmQgdXNpbmcgdGhlIGRldmlhdGlvbiB0aGF04oCZcyBsZWZ0KQ==[Qq]

[c]IGFsbCBvZiB0 aGUgYWJvdmU=[Qq]

[c]IG5vbmUgb2YgdGhlIGFib3Zl

Cg==[Qq]

[q random_mc=”false”] When recoding data, you should ___.

[c]IG5ldmVyIGNvZGUgb3ZlciB5b3VyIG9yaWdpbmFsIGRhdGE=[Qq]

[c]IGFsd2F5cyBjcmVhdGUgYSBjb3B5IG9mIHRoZSBvcmlnaW5hbCB2YXJpYWJsZSBmaXJzdA==[Qq]

[c]IGNhcmVmdWxseSBjcmVhdGUgYSByZWNvZGUgcGxhbiB0aGF0IHByZXNlcnZlcyBvcmlnaW5hbCB2YWx1ZXM=[Qq]

[c]IGNvbmZpcm0gdmlhIGNyb3NzdGFidWxhdGlvbiB0aGF0IHRoZSBvcmlnaW5hbCBhbmQgbmV3IHZlcnNpb25zIG1hdGNoIGFzIGRlc2lyZWQ=[Qq]

[c]IGFsbCBvZiB0 aGUgYWJvdmU=

Cg==[Qq]

[q random_mc=”false”] Omitted variable bias can cause our coefficients’ ___ to be wrong.

[c]IHNpZ24=[Qq]

[c]IG1hZ25pdHVkZSAoc2l6ZSk=[Qq]

[c]IHN0YW5kYXJkIGVycm9yICjigJhzdXJlbmVzc+KAmS9zaWduaWZpY2FuY2Up[Qq]

[c]IGFsbCBvZiB0 aGUgYWJvdmU=[Qq]

[c]IG5vbmUgb2YgdGhlIGFib3Zl

Cg==[Qq]

[q random_mc=”false”] Probit and logit models allow us to ____.

[c]IG1vZGVsIGRpY2hvdG 9tb3VzIG91dGNvbWVz[Qq]

[c]IGluY2x1ZGUgcHJvYmFiaWxpdHkgYW5kIGxvZ2dlZCB2YXJpYWJsZXM=[Qq]

[c]IGV2YWx1YXRlIG11bHRpcGxlIGRlcGVuZGVudCB2YXJpYWJsZXMgaW4gdGhlIHNhbWUgbW9kZWw=[Qq]

[c]IGFsbCBvZiB0aGUgYWJvdmU=[Qq]

[c]IG5vbmUgb2YgdGhlIGFib3Zl[Qq]

[/qwiz]

Archives

Categories

Site contents (c) Leanne C. Powner, 2012-2026.
Background graphic: filo / DigitalVision Vectors / Getty Images.
Cover graphic: Cambridge University Press.

Powered by WordPress / Academica WordPress Theme by WPZOOM