Culture, Statistics, and Society

Sunday, May 29, 2011

Quantitative Literary Studies

Fascinating article on the attempts by Franco Moretti, a Stanford English professor and member of the Stanford Literary Lab, to create a quantitative literary studies by digitizing and statistically analyzing literary texts.

Thursday, February 03, 2011

Daniel Bell, Master Sociologist

The NYT posted an excellent profile of my late friend Dan Bell, the master sociologist (and big thinker). Even though he was a big-thinking social theorist, I remember that Dan told me, emphatically in fact, that he was a quantitative sociologist! This makes sense, since his big books often included quantitative data of trend lines (which is in many ways advantageous over modeling the data and then focusing on the model parameters, such as regression coefficients or standard errors).

Monday, January 18, 2010

Why are Professors Liberal?

Check out this really cool article (okay, so I'm biased) on professorial politics in the New York Times.

Wednesday, December 30, 2009

R in the NYT

The statistical package R received a positive overview in the New York Times recently.

Tuesday, December 29, 2009

Top Ten Must-Have R Packages for Social Scientists

The political scientist Drew Conway has come up with a useful list of his ten "must-have" R packages for social scientists. I agree with him for the most part, and his list highlights the usefulness of R (vis-a-vis Stata) for social network analysis (see statnet/igraph) and graphics (see ggplot2). In some respects, his list also underscores the fact that R is arguably more suited for sociological data analysis than Stata, given the former's unique packages not only for social network analysis but also multilevel modeling and a variety of non-parametric methods (including more recent forms of matching and classification techniques), which were especially popular in sociology before the "path analysis" revolution of the 1960s.

Tuesday, December 22, 2009

Multilevel and Longitudinal Modeling in Stata

For my "off-task" reading I recent perused an excellent book on multilevel and longitudinal modeling in Stata by Sophia Rabe-Hesketh and Anders Skrondal. The second edition (which I read) has been updated by including several chapters providing an overview of regression modeling and ANOVA (analysis of variance) as well as additional background information on models with nonlinear outcomes (e.g., logistic regression). The authors even include a self-test near the beginning of the book to ensure that readers can confidently progress through the rest of the material. The book has many great features, including ease of data accessibility (simply go to this website and you instantly have all the datasets used in the book), clarity of presentation, and numerous applied examples with accompanying Stata code. The only problem, which is not a problem with the book, is that multilevel modeling in Stata (as the authors note) can be rather slow, especially for nonlinear outcomes with many levels. (For this reason, when using nonlinear outcomes other statistical packages may be more desirable than Stata, such as R.) Yet overall the book is an excellent overview of an important class of statistical models, and can even be viewed as a way of take advantage of Stata beyond the realm of "econometric" approaches (which seems to be Stata's strength) and toward the realm of putatively more "sociologic" methods of data analysis, in which clustered data are viewed as something important in their own right rather than as statistical nuisances.

Thursday, December 17, 2009

Sociology = Hedge Fund?

Some people have blamed hedge funds (whose managers can earn extraordinarily high returns) for contributing to the high level of economic inequality in the USA. And who invented the hedge fund? We can thank the sociologist Alfred Winslow Jon es, Ph.D.

Monday, December 14, 2009

A Quantitative Tour of the Social Sciences

I just read "A Quantitative Tour of the Social Sciences," edited by Andrew Gelman and Jeronimo Cortina. I highly recommend the book for anyone who does quantitative research, including part-time quantitative analysts and ambitious undergraduates. The aim of the book is to expose the reader to the similarities and differences in quantitative thinking across five core social science disciplines: history (a welcome but oft neglected member of the social sciences), economics, sociology, political science, and psychology. The editors are unabashedly in favor of methodological pluralism, and present as diverse set of views as possible. What is notable about this volume is that for each discipline the authors have included exercises ranging from conceptual questions to hands-on data analyses. From my perspective, especially illuminating chapters include Andrew Gelman's thoroughly informative discussion of the application of game theory to trench warfare (in part because he discusses the criticisms of his paper as it went through peer review) and Jeronimo Cortina's overview of the potential outcomes model of causality (which, while familiar to more advanced readers, is presented with enviable clarity).

The chapters capture most of the differences among the disciplines in quantitative thinking. However, a few differences in mathematical modeling may be missed. In particular, likely reflecting an enduring interest in social context and interconnections among individuals, sociologists tend to use multilevel models and social network methods more frequently than other social scientists. As well, economists are much more likely to focus on trying to interpret observational data causally through the use of instrumental variables and, to a lesser extent, regression discontinuity design. Notwithstanding, overall this book is a welcome addition to the bookshelf of any scholar who does quantitative work.

Creating Summated Scales

The sociologist Paul Millar has created a very useful tool in Stata called optifact for creating summated scales. Often social scientists simply throw together some variables, report a Cronbach's alpha as a measure of reliability, and then move on to other analyses. But there are some well-known problems with Cronbach's alpha; in particular, the value of alpha will increase if you add more items in the scale. Millar's program is useful because for those items which load on one dimension, it sorts candidate scales by increasing number of items in descending order of Cronbach's alpha. For instance, the program will first list all scales with, say, two items that load on one factor, starting with the scale with the highest Cronbach's alpha. This way the analyst can easily create summated scales that are parsimonious (i.e., consist of few items), unidimensional (i.e., load on one factor), and have a high level of reliability (i.e., have a high Cronbach's alpha).

Simpson's Paradox Strikes Again

The Wall Street Journal published a recent article on Simpson's paradox and the jobless rate in the USA. When aggregated, the jobless rate is lower today than in the 1980s; however, when broken down by educational levels, the jobless rate for each educational group is higher. How can this be? The reason the overall rate is higher today is because college graduates, who tend to have a lower unemployment rate than other groups even in good economic times, are a larger proportion of the population today than in the past. Thus, the weighted average across all educational groups is weighted much more by college graduates today than in the 1980s, lifting the overall unemployment rate higher even though all educational groups are faring worse.