Culture, Statistics, and Society: March 2012

Friday, March 30, 2012

Physics Envy

The NYT published an op-ed today by a pair of political scientists on "physics envy" by sociologists, economists, and political scientists. The authors mainly argue that theory can be useful even when it is wrong or unsupported by data, and briefly mention that data analysis is useful even if theoretical contributions are not obvious. I disagree with the former, but not the latter. For a similar view, see this post by the theoretical physicist Sean Carroll.

Thursday, March 29, 2012

Irving Louis Horowitz

The eminent political sociologist died a few days ago, according to an obit in the NYT. Long ago I read, and took seriously, his book The Decomposition of Sociology, in which he argues (essentially) for more empirical analysis and less left-wing politics in sociology. Reflecting on his book, he neglects a fundamental, possible cultural contradiction: to the extent social reality exhibits facts consistent with liberalism and inconsistent with conservatism, empirical analysis will result in more liberal than conservative belief systems (but not values, since those cannot be proven "right" or "wrong" by scientific analysis). For example, evidence is accumulating that economic inequality (which is of little concern to most conservatives in the United States), has numerous deleterious effects, thus forcing conservatives either to hold beliefs inconsistent with the evidence (i.e., inequality is unrelated to deleterious effects) or alter their values (i.e., it is a "good" thing to have high rates of violence, low social mobility, and so forth).

Wednesday, March 28, 2012

MyPersonality

I highly recommend this website for learning about your attitudes, values, beliefs, and overall personality.

Sunday, March 25, 2012

Why are Economists so (Consistently) Led Astray About Inequality?

In a recent Boston Globe article Ed Glaeser, a conservative urban economist at Harvard, wrote an article titled Why income disparity in Boston isn't a bad thing. Glaeser is right that inequality increases in a city such as Boston can be due to selection effects, since poor people are moving into Boston for economic and cultural opportunities. Yet these selection effects (i.e., poor people moving into a geographic area in the hopes of upward mobility, which is generally considered a good thing) is drastically different from the observed outcomes (i.e., large disparities in people's wealth due to their social positions in a system of occupations, which is generally considered a bad thing). Yet Glaeser conflates the two, confusing the reader and, perhaps, himself. A more accurate title for the article would have been "Why poor people moving into Boston isn't a bad thing." This raises a question: why are economists so (consistently) led astray about the causes and consequences of economic, social, and political inequality?

Popularity of Programming Languages

As you can see, R is relatively popular (but more so on StackOverflow than GitHub):

For the original graph, click here. This scatter plot is a reminder that R is useful to learn not only for statistical modeling (since there are so many excellent packages available), but also as a way to become familiar with programming more generally.

Saturday, March 24, 2012

Big Science and Sociology

I highly recommend this video featuring Dirk Helbing, a sociologist and erstwhile physicist who is (along with others) attempting to create a CERN-like society-simulating project for the social sciences by combining information from large data sets with simulated models of complex social systems:

Thursday, March 22, 2012

Statistical Lexicon

Anyone doing statistical analysis (or contemplating it) should read Andy Gelman's informative, humorous, and dead-on correct post on statistical lexicon.

McKinsey on Big Data

McKinsey has a full report (from March 2011) describing the meaning and potential impact of so-called big data. You can read the report here. One problem, which the authors of the report do not discuss in detail, is the that since so much of what constitutes big data will be collected by private firms there are possibilities of restricted information pockets. In other words, only certain private actors will have access to big data, and academics might very well be left very few big data sources.

Wednesday, March 21, 2012

Inequality: Everyone's Thinking About It

I ran into the following articles on inequality, which has not only been increasing structurally but culturally (in that more policy elites and journalists are discussing the topic openly). Here are some recent posts on inequality:

Reuters is reporting findings from a group of researchers showing that Sweden has undergone an enormous increase in inequality, especially since the rise of the center-right in the political system. For those of us in the United States who look to Sweden as a model of development, in recent years even this country has regressed from the ideals of social democracy.
Based on an online survey (with all the caveats about sampling procedures, of course), a group has surveyed wealthy Americans on their views on inequality. The biggest finding, which reinforces the importance of class-based analyses of electoral politics: among the wealthy there is a huge gap between self-identified Republicans and Democrats, with over 84% of the latter favoring policies taxing the rich while around 29% of the former.

Universal Limits in High-Dimensional Statistics

The MIT Center on Operations Research is hosting a talk tomorrow on universal limits in high-dimensional statistics. The basic idea is that, for all fields of empirical study from sociology to high-energy physics, some criterion for "statistical significance" is crucial for making decisions based on the data. (The current hunt for the Higgs Boson particle is in fact based on a modified criterion for statistical significance.) The problem, however, is that we are entering a world of big data, in which data structures have many dimensions, thus altering the potential usefulness of such criterion for statistical significance.

Sunday, March 18, 2012

Rethinking Tragedy and Success

The social theorist Alain de Botton presents a creative rethinking of the meaning of tragedy and success in a TED talk, shown here:

In essence, he argues that success needs to be rethought using insights from sociology, including an understanding of the limits of the ideal of a meritocratic society (since there is always random chance involved in social mobility), a deeper awareness of how failure as a concept involves particular beliefs and values (so that we can conclude that Hamlet is not a "loser" even though he "lost"), and a sensitivity to the fact that even when particular social and cultural distinctions appear to be irrelevant economic differences certainly are not (so that comparing oneself to Bill Gates rather than the Queen of England is just as absurd, even though the former wears "business casual").

Saturday, March 17, 2012

Why Inequality Matters

The conservative magazine Commentary has published an article on how social inequality is on the political agenda and on the minds of most Americans, even though many conservatives would prefer the case to be otherwise. The authors argue that, in part, the discussion of inequality should be oriented toward social mobility and poverty, as well as the "injustices" of government policy. What the authors apparently fail to realize is the possibility that inequality causes poverty and immobility, not to mention "unjust" government policies perpetuating inequality. In particular, higher inequality can cause low social mobility by increasing socioeconomic distances between the highest and lowest rungs of society, higher rates of poverty by segregating groups and distorting resource allocations, and inequality-perpetuating government policies by shifting costs from the wealthy to the general population (through, for example, cutting funds for widely-available public services and increasing take-home profits from private organizations).

Friday, March 16, 2012

Inequality "Crisis" of Marriage

The Atlantic Monthly posted a fascinating article today on the inequality "crisis" of marriage. My favorite line in the article: "Gone are the days when the Harvard grad marries the girl with the high school degree simply because, well, she's pretty."

Thursday, March 15, 2012

Corporate Culture Revisited

Greg Smith has a popular post in the NYT titled Why I am Leaving Goldman Sachs. His reason is that the organizational culture is now "as toxic and destructive" as has "ever seen it." In particular, Smith criticizes that the values and norms of the organization are oriented almost exclusively toward profit-making, with little or no regard for the well-being of other organizations and people, including their clients.

Misc. Links

MIT students are having a Pi Day recitation and celebration today (since today is 3.14, of course).
The Financial Times discusses Goldman Sachs' corporate culture without, unfortunately, describing what is meant by the phrase; however, I'm glad to see that cultural factors are mentioned, since clearly faulty beliefs, norms, and values contributed to financial crisis.
The U.S. Census Bureau recently released a report describing the inequality levels (expressed as Gini coefficients) of all counties in the United States from 2006 to 2010; the findings show, as one would expect, that more populous counties are more unequal.
Finally, a new study suggests that first-generation immigrants face a disadvantage in attending college due a "cultural mismatch" in values and norms from between working-class youth and those from middle- and upper-class backgrounds.

Tuesday, March 13, 2012

MIT Inequality Talk

As part of the technology and culture forum at MIT, I attended a talk featuring the notable economists Frank Levy (Professor of Urban Economics at MIT), David Autor (Associate Chair of the MIT economics department), Peter Diamond (MIT Institute Professor Emeritus), and Arjun Jayadev (Assistant Professor Economics at UMass-Boston). I've read quite a bit of their work, and they have all conducted important research on inequality, poverty, and policy; for instance, Frank Levy's The New Dollars and Dreams: American Incomes and Economic Change is still (over a decade later since the last edition was published) one of the best overviews of trends in economic conditions in the United States since World War II. The panelists focused on the causes and consequences of income and wage inequality, as well as possible solutions, with moderation by David Autor.

Scatter Plot Matrix in R

Stata has a large number of graphics capabilities (and I highly recommend Stata over other statistical packages for a variety of reasons), but in a few instances R is more useful. In particular, I find R useful for creating beautiful scatter plot matrices and 3-D graphical displays. To my knowledge, currently these kinds of graphics are very difficult (if not impossible) to create in Stata 12. What I like about scatter plot matrices is that can have a high data-to-ink ratio, packing together fitted lines, scattered data, histograms, correlations (proportional to the size of the correlation), and statistical significance "stars" (since reviewers seem to like them). Moreover, I like that all the information effectively puts the "stars" associated with statistical significance in appropriate context: there is an incredible amount of variability in the size of correlations and distribution of data among all the "three-star" correlations, underscoring the limited usefulness of statistical significance as a tool for understanding the social reality given to us by data.

Monday, March 12, 2012

Taxes and Inequality

The economist Daren Acemoglu and his colleague James Robinson have an excellent article on the problems with inequality in the United States. You can find it here. In general, I agree with them entirely, and they are persuasive in outlining the negative aspects of political inequality.

3-D Scatter Plots Redux

One weakness of Stata versus R is the lack of 3-D graphing capabilities, in particular 3-D scatter plots. However, with some modifications, Stata can indeed provide a suitable substitute for R in most graphical problems, as shown here (I use the infamous auto data set available in Stata with the sysuse command). The main weakness is that the x-y and y-z planes do not have grid lines; nevertheless, this graph is another indication that Stata's graphing capabilities are much stronger than many R users (and perhaps even Stata users) realize. Here's the graph:

Saturday, March 10, 2012

Checking Weather in Stata

I added a useful Stata command to my computer today: Neal Caren's weathr command in Stata (note that there is no "e"). The command is great: now you can check your day's weather entirely within Stata! The command obtains the current weather conditions and forecast for the next 36 hours from yahoo.com for any zip code in the United States.

Friday, March 09, 2012

Is Everything Culture?

In my readings on culture, I've found a fascinating set of theories called digital physics. These theories posit that the universe fundamentally consists of information (i.e., the "it for bit" doctrine that every particle, atom, quark, and so on is describable as a dichotomous "yes or no" categorization), and thus that the universe is in principle computable. Opponents to digital physics claim that reality is continuous, but the rejoinder is that reality only appears continuous, and is fundamentally categorical (for example, the Planck length suggests that reality is quantized). More relevant to sociology, these perspectives suggest that everything is culture -- i.e., information -- and thus that societies can be usefully modeled as information systems.

Thursday, March 08, 2012

Ternary (or Triaxial) Plots

One rarely-used graphic is the ternary (or triaxial) plot, which is a very useful way of examining a tripartite decomposition of a variable. For example, the graph in this post displays the composition (which I constructed in Stata using Nicholas J. Cox's commands) of an economy over time. Note that the three percentages add to 100 (or, equivalently, the three proportions add to 1).

It's a bit surprising that this graph appears so infrequently; it would appear to be especially useful for political scientists showing voting fractions over time (with the three most prominent parties for each axis), economists examining the composition of an economy (such as above), or sociologists examining over-time trends in any three-part categorical variable (such as "agree," "disagree," or "neutral" on a question of values or attitudes).

However, note that simply because a graph looks like it's a ternary plot does not make it one! For example, Junk Charts dissects this pseudo-ternary plot in the New York Times.

Wednesday, March 07, 2012

Causality and Ethnography

The University of Chicago is hosting a conference on causality and ethnography on March 8th and 9th. Full details are available here. My own view on the relationship between causality and ethnography is that ethnographers should use counterfactuals, and in fact usually do whether or not they are explicit about them. In modern statistics (in particular, the work of Donald Rubin at Harvard, among others, on the potential outcomes model), the counterfactual model of causaltiy clarifies the conditions under which any particular data set can be interpreted as causal, and shows that these assumptions are extremely strong. Contra the prevailing view of many economists, even instrumental variables regression, regression discontinuity design, and related methods require exceptionally (and often implausibly) strong assumptions for causal interpretation.

The Mystery of Power-Law Distributions

One criticism of sociology, and the macro social sciences more generally (such as political science, anthropology, and economics), is that there are very few "laws" of social reality. There are, however, some sociological regularities that are as yet not fully explained, and which seem bizarre. The most enduring and puzzling of these are power-law distributions (a well-known special case of this is "Zipf's Law"), which is the fact that "large" instances of things are extremely rare, while "small" occurrences of things are extremely common (where size can refer to frequency in a population, population size, geographic space, and so on). In practice this means that a handful of words are much more frequent than other words (and most words are rarely used), wealth is concentrated in a small number of people (and most people are poor), there are a handful of really popular songs (and a vast number of unpopular tunes), and so on. Even the sizes of sand particles on a beach follow a power-law distribution: how often have you seen a boulder on a beach?

What might explain the ubiquity of power-law distributions? As far as I can tell, nobody is entirely sure, although we have some good guesses. For example, the sociologist Herbert Simon outlined a theory of preferential growth attachment (also known as the "rich get richer" effect), in which songs that are already fairly popular will become more popular, cities that are already large will become even larger, and words already used widely will become even more widely used. Note that this explanation hinges on a positive feedback effect: the probability that any thing gets "larger" is directly proportional to the current "largeness" of the thing; or, to put it another way, large values get amplified rather than cancelled out (as in a normal distribution).

Power-law distributions have important cultural, statistical, and political implications.

Culturally, there are several implications. First, most cultural constructs are rarely used and only a handful are common among any group of people. To put it another way, the shared part of culture is likely to be relatively small, while the particular part of culture is vast. Second, frequently used cultural constructs are particularly stable over time; that is, 500 years from the word "the" will still be used, while "sesquipedalian" has a more uncertain future. Third, the stability of a cultural system is derived from the more frequently used cultural constructs, while the dyanmism is among the less frequently used constructs. Fourth, initial conditions are extremely important for the frequency and hence durability of cultural constructs: for instance, small, random fluctuations led to the popularity of "the" in the English language. Finally, following from the previous point, the consequences of initial conditions are highly unpredictable; given small initial changes English speakers today might instead be using the word "tha" or "se" instead of "the."

Statistically, the presence of power-law distributions is a reminder that classical linear regression (based on the normal distribution) is not always the appropriate fit to a scatter plot of two variables, and that summarizing a distribution as a mean or median can be highly misleading.

Politically, power-law distributions have a unique implication for efforts to deal with wealth inequality: one effective way to alter the distribution of wealth is to remove the positive feedback effects from wealth. The desired distribution of wealth would thus be described by a normal rather than power law function. Importantly, removing the positive feedback effects of wealth would not lead to the removal of inequality, but rather a change in the distribution so that the mean, median, and mode are the same. From this perspective, policies should be in place so that (in principle) a person's change in wealth is independent of their current level of wealth. Such policies might include very high taxes on capital gains, restrictions on the influence of wealth in political decision-making, rules specifying equal monetary amounts from promotions for all occupational levels in a firm, and so on.

Monday, March 05, 2012

Visualizing a Correlation Table

Correlation tables are ubiquitous in social science research, but very rarely they are visualized. As I've emphasized in previous posts, I'm a strong advocate for visualizing data and models whenever possible. For example, for my research I graphed correlations using Adrian Mander's plotmatrix command in Stata. Using Mander's package, I could create a graph that clearly shows all the information in a parsimonious way; moreover, unlike a correlation table, correlation patterns are intuitively grasped from the shading of the cells, and there is an implicit emphasis on the correlation size rather than statistical significance.

Sunday, March 04, 2012

Why Models are Not Data

In doing research, sometimes it can be easy to think that the models one is using are in fact the data -- but this is clearly not true. Even the mean of a sample of data is a model of the central tendency of the data, and not the data itself. One clear example of why models are not data is Anscombe's quartet. For example, take the following:

What is remarkable about this quartet is that for all of these scatter plots the mean of x is the same (exactly), the variance of x is the same (exactly), the mean of y is the same (to two decimal places), the variance of y is the same (to three decimal places), the correlation between x and y is the same (to three decimal places), and the linear regression equation is the same (to two or three decimal places). In other words, the models of the data (e.g., mean, variance, correlation, etc.) are the same, but the data are not!

So what's the solution? As I've mentioned in previous posts, graphing the data is crucial, because we're forced to confront the actual data, and not models of the data.

Saturday, March 03, 2012

R versus Stata Redux

I've used both R and Stata for a long time, but these days I use Stata much more frequently than R. While R is useful for some kinds of graphics (especially three-dimensional graphics) and some statistical procedures (for example, finite mixture models), in general I prefer Stata as the go-to statistical program. The reasons are clear: Stata has superior help files for almost all ado files, Stata graphics are excellent (even contour plots are available in Stata), cleaning data is a breeze in Stata but awkward in R, labeling data is much efficient in Stata (in fact, as far as I can tell R does not allow for labeling variable names, while Stata allows for labeling levels of a variable, the variable itself, and the data set), and for many procedures Stata's syntax is much more parsimonious than R's.

Yet, R is worth learning because the 3-D graphics available are often extremely useful for exploring the data, and there will certainly be cases in which R will have statistical procedures that are unavailable or cumbersome in Stata (Bayesian analyses and finite mixture models come to mind, for example).

Friday, March 02, 2012

Culture and Poverty

The New York Times has an article covering the concept of the culture of poverty here. The article is fairly accurate, and does a good job highlighting that the study of culture and poverty had its origins in left-wing Marxists (although I would have mentioned Bowles and Gintis, who emphasized that cultural values and norms of obedience to capitalist ideologies rather than intelligence contribute to the social reproduction of inequality). The author elides the fact that the problem with the concept of the "culture of poverty" is that such a thing does not, and never has, existed: culture is everywhere, not just among the a subset of the economically disadvantaged. The appropriate question, then, is: given that we know that culture is a constituent part of the human experience, how does it matter not just for poverty, but for happiness, well-being, inequality, wealth, and so on?

Thursday, March 01, 2012

Values and Politics

I'm a bit biased, but the front page of the Huffington Post highlighted a fascinating study on education, culture and politics today.