In doing research, sometimes it can be easy to think that the models
one is using are in fact the data -- but this is clearly not true. Even
the mean of a sample of data is a model of the central tendency of the
data, and not the data itself. One clear example of why models are not
data is Anscombe's quartet. For example, take the following:
What is remarkable about this quartet is that for all of these scatter plots the mean of x is the same (exactly), the variance of x is the same (exactly), the mean of y is the same (to two decimal places), the variance of y is the same (to three decimal places), the correlation between x and y
is the same (to three decimal places), and the linear regression
equation is the same (to two or three decimal places). In other words,
the models of the data (e.g., mean, variance, correlation, etc.) are the
same, but the data are not!
So what's the solution? As
I've mentioned in previous posts, graphing the data is crucial, because
we're forced to confront the actual data, and not models of the data.
Blog Archive
-
▼
2012
(56)
-
▼
March
(29)
- Physics Envy
- Irving Louis Horowitz
- MyPersonality
- Why are Economists so (Consistently) Led Astray Ab...
- Popularity of Programming Languages
- Big Science and Sociology
- Statistical Lexicon
- McKinsey on Big Data
- Inequality: Everyone's Thinking About It
- Universal Limits in High-Dimensional Statistics
- Rethinking Tragedy and Success
- Why Inequality Matters
- Inequality "Crisis" of Marriage
- Corporate Culture Revisited
- Misc. Links
- MIT Inequality Talk
- Scatter Plot Matrix in R
- Taxes and Inequality
- 3-D Scatter Plots Redux
- Checking Weather in Stata
- Is Everything Culture?
- Ternary (or Triaxial) Plots
- Causality and Ethnography
- The Mystery of Power-Law Distributions
- Visualizing a Correlation Table
- Why Models are Not Data
- R versus Stata Redux
- Culture and Poverty
- Values and Politics
-
▼
March
(29)