Sunday, May 29, 2011
Quantitative Literary Studies
Fascinating article on the attempts by Franco Moretti, a Stanford English professor and member of the Stanford Literary Lab, to create a quantitative literary studies by digitizing and statistically analyzing literary texts.
Thursday, February 03, 2011
Daniel Bell, Master Sociologist
The NYT posted an excellent profile of my late friend Dan Bell, the master sociologist (and big thinker). Even though he was a big-thinking social theorist, I remember that Dan told me, emphatically in fact, that he was a quantitative sociologist! This makes sense, since his big books often included quantitative data of trend lines (which is in many ways advantageous over modeling the data and then focusing on the model parameters, such as regression coefficients or standard errors).
Monday, January 18, 2010
Why are Professors Liberal?
Check out this really cool article (okay, so I'm biased) on professorial politics in the New York Times.
Wednesday, December 30, 2009
Tuesday, December 29, 2009
Top Ten Must-Have R Packages for Social Scientists
The political scientist Drew Conway has come up with a useful list of his ten "must-have" R packages for social scientists. I agree with him for the most part, and his list highlights the usefulness of R (vis-a-vis Stata) for social network analysis (see statnet/igraph) and graphics (see ggplot2). In some respects, his list also underscores the fact that R is arguably more suited for sociological data analysis than Stata, given the former's unique packages not only for social network analysis but also multilevel modeling and a variety of non-parametric methods (including more recent forms of matching and classification techniques), which were especially popular in sociology before the "path analysis" revolution of the 1960s.
Tuesday, December 22, 2009
Multilevel and Longitudinal Modeling in Stata
For my "off-task" reading I recent perused an excellent book on multilevel and longitudinal modeling in Stata by Sophia Rabe-Hesketh and Anders Skrondal. The second edition (which I read) has been updated by including several chapters providing an overview of regression modeling and ANOVA (analysis of variance) as well as additional background information on models with nonlinear outcomes (e.g., logistic regression). The authors even include a self-test near the beginning of the book to ensure that readers can confidently progress through the rest of the material. The book has many great features, including ease of data accessibility (simply go to this website and you instantly have all the datasets used in the book), clarity of presentation, and numerous applied examples with accompanying Stata code. The only problem, which is not a problem with the book, is that multilevel modeling in Stata (as the authors note) can be rather slow, especially for nonlinear outcomes with many levels. (For this reason, when using nonlinear outcomes other statistical packages may be more desirable than Stata, such as R.) Yet overall the book is an excellent overview of an important class of statistical models, and can even be viewed as a way of take advantage of Stata beyond the realm of "econometric" approaches (which seems to be Stata's strength) and toward the realm of putatively more "sociologic" methods of data analysis, in which clustered data are viewed as something important in their own right rather than as statistical nuisances.
Thursday, December 17, 2009
Sociology = Hedge Fund?
Some people have blamed hedge funds (whose managers can earn extraordinarily high returns) for contributing to the high level of economic inequality in the USA. And who invented the hedge fund? We can thank the sociologist Alfred Winslow Jones, Ph.D.
Monday, December 14, 2009
A Quantitative Tour of the Social Sciences
I just read "A Quantitative Tour of the Social Sciences," edited by Andrew Gelman and Jeronimo Cortina. I highly recommend the book for anyone who does quantitative research, including part-time quantitative analysts and ambitious undergraduates. The aim of the book is to expose the reader to the similarities and differences in quantitative thinking across five core social science disciplines: history (a welcome but oft neglected member of the social sciences), economics, sociology, political science, and psychology. The editors are unabashedly in favor of methodological pluralism, and present as diverse set of views as possible. What is notable about this volume is that for each discipline the authors have included exercises ranging from conceptual questions to hands-on data analyses. From my perspective, especially illuminating chapters include Andrew Gelman's thoroughly informative discussion of the application of game theory to trench warfare (in part because he discusses the criticisms of his paper as it went through peer review) and Jeronimo Cortina's overview of the potential outcomes model of causality (which, while familiar to more advanced readers, is presented with enviable clarity).
The chapters capture most of the differences among the disciplines in quantitative thinking. However, a few differences in mathematical modeling may be missed. In particular, likely reflecting an enduring interest in social context and interconnections among individuals, sociologists tend to use multilevel models and social network methods more frequently than other social scientists. As well, economists are much more likely to focus on trying to interpret observational data causally through the use of instrumental variables and, to a lesser extent, regression discontinuity design. Notwithstanding, overall this book is a welcome addition to the bookshelf of any scholar who does quantitative work.
Creating Summated Scales
The sociologist Paul Millar has created a very useful tool in Stata called optifact for creating summated scales. Often social scientists simply throw together some variables, report a Cronbach's alpha as a measure of reliability, and then move on to other analyses. But there are some well-known problems with Cronbach's alpha; in particular, the value of alpha will increase if you add more items in the scale. Millar's program is useful because for those items which load on one dimension, it sorts candidate scales by increasing number of items in descending order of Cronbach's alpha. For instance, the program will first list all scales with, say, two items that load on one factor, starting with the scale with the highest Cronbach's alpha. This way the analyst can easily create summated scales that are parsimonious (i.e., consist of few items), unidimensional (i.e., load on one factor), and have a high level of reliability (i.e., have a high Cronbach's alpha).
Simpson's Paradox Strikes Again
The Wall Street Journal published a recent article on Simpson's paradox and the jobless rate in the USA. When aggregated, the jobless rate is lower today than in the 1980s; however, when broken down by educational levels, the jobless rate for each educational group is higher. How can this be? The reason the overall rate is higher today is because college graduates, who tend to have a lower unemployment rate than other groups even in good economic times, are a larger proportion of the population today than in the past. Thus, the weighted average across all educational groups is weighted much more by college graduates today than in the 1980s, lifting the overall unemployment rate higher even though all educational groups are faring worse.
Subscribe to:
Posts (Atom)