Friday, March 30, 2012
Physics Envy
The NYT published an oped today by a pair of political scientists on "physics envy" by sociologists, economists, and political scientists. The authors mainly argue that theory can be useful even when it is wrong or unsupported by data, and briefly mention that data analysis is useful even if theoretical contributions are not obvious. I disagree with the former, but not the latter. For a similar view, see this post by the theoretical physicist Sean Carroll.
Thursday, March 29, 2012
Irving Louis Horowitz
The eminent political sociologist died a few days ago, according to an obit in the NYT. Long ago I read, and took seriously, his book The Decomposition of Sociology, in which he argues (essentially) for more empirical analysis and less leftwing politics in sociology. Reflecting on his book, he neglects a fundamental, possible cultural contradiction: to the extent social reality exhibits facts consistent with liberalism and inconsistent with conservatism, empirical analysis will result in more liberal than conservative belief systems (but not values, since those cannot be proven "right" or "wrong" by scientific analysis). For example, evidence is accumulating that economic inequality (which is of little concern to most conservatives in the United States), has numerous deleterious effects, thus forcing conservatives either to hold beliefs inconsistent with the evidence (i.e., inequality is unrelated to deleterious effects) or alter their values (i.e., it is a "good" thing to have high rates of violence, low social mobility, and so forth).
Wednesday, March 28, 2012
MyPersonality
I highly recommend this website for learning about your attitudes, values, beliefs, and overall personality.
Sunday, March 25, 2012
Why are Economists so (Consistently) Led Astray About Inequality?
In a recent Boston Globe article Ed Glaeser, a conservative urban economist at Harvard, wrote an article titled Why income disparity in Boston isn't a bad thing. Glaeser is right that inequality increases in a city such as Boston can be due to selection effects, since poor people are moving into Boston for economic and cultural opportunities. Yet these selection effects (i.e., poor people moving into a geographic area in the hopes of upward mobility, which is generally considered a good thing) is drastically different from the observed outcomes (i.e., large disparities in people's wealth due to their social positions in a system of occupations, which is generally considered a bad thing). Yet Glaeser conflates the two, confusing the reader and, perhaps, himself. A more accurate title for the article would have been "Why poor people moving into Boston isn't a bad thing." This raises a question: why are economists so (consistently) led astray about the causes and consequences of economic, social, and political inequality?
Popularity of Programming Languages
As you can see, R is relatively popular (but more so on StackOverflow than GitHub):
For the original graph, click here. This scatter plot is a reminder that R is useful to learn not only for statistical modeling (since there are so many excellent packages available), but also as a way to become familiar with programming more generally.
Saturday, March 24, 2012
Big Science and Sociology
I highly recommend this video featuring Dirk Helbing, a sociologist and erstwhile physicist who is (along with others) attempting to create a CERNlike societysimulating project for the social sciences by combining information from large data sets with simulated models of complex social systems:
Thursday, March 22, 2012
Statistical Lexicon
Anyone doing statistical analysis (or contemplating it) should read Andy Gelman's informative, humorous, and deadon correct post on statistical lexicon.
McKinsey on Big Data
McKinsey has a full report (from March 2011) describing the meaning and potential impact of socalled big data. You can read the report here. One problem, which the authors of the report do not discuss in detail, is the that since so much of what constitutes big data will be collected by private firms there are possibilities of restricted information pockets. In other words, only certain private actors will have access to big data, and academics might very well be left very few big data sources.
Wednesday, March 21, 2012
Inequality: Everyone's Thinking About It
I ran into the following articles on inequality, which has not only been increasing structurally but culturally (in that more policy elites and journalists are discussing the topic openly). Here are some recent posts on inequality:
 Reuters is reporting findings from a group of researchers showing that Sweden has undergone an enormous increase in inequality, especially since the rise of the centerright in the political system. For those of us in the United States who look to Sweden as a model of development, in recent years even this country has regressed from the ideals of social democracy.
 Based on an online survey (with all the caveats about sampling procedures, of course), a group has surveyed wealthy Americans on their views on inequality. The biggest finding, which reinforces the importance of classbased analyses of electoral politics: among the wealthy there is a huge gap between selfidentified Republicans and Democrats, with over 84% of the latter favoring policies taxing the rich while around 29% of the former.
Universal Limits in HighDimensional Statistics
The MIT Center on Operations Research is hosting a talk tomorrow on universal limits in highdimensional statistics. The basic idea is that, for all fields of empirical study from sociology to highenergy physics, some criterion for "statistical significance" is crucial for making decisions based on the data. (The current hunt for the Higgs Boson particle is in fact based on a modified criterion for statistical significance.) The problem, however, is that we are entering a world of big data, in which data structures have many dimensions, thus altering the potential usefulness of such criterion for statistical significance.
Sunday, March 18, 2012
Rethinking Tragedy and Success
The social theorist Alain de Botton presents a creative rethinking of the meaning of tragedy and success in a TED talk, shown here:
In essence, he argues that success needs to be rethought using insights
from sociology, including an understanding of the limits of the ideal of
a meritocratic society (since there is always random chance involved in social mobility), a deeper awareness of how failure as a concept
involves particular beliefs and values (so that we can conclude that
Hamlet is not a "loser" even though he "lost"), and a sensitivity to the fact
that even when particular social and cultural distinctions appear to be irrelevant economic differences certainly are not (so that comparing oneself to Bill Gates rather than the Queen of England is just as absurd, even though the former wears "business casual").
Saturday, March 17, 2012
Why Inequality Matters
The conservative magazine Commentary has published an article on how social inequality is on the political agenda and on the minds of most Americans,
even though many conservatives would prefer the case to be otherwise.
The authors argue that, in part, the discussion of inequality should be
oriented toward social mobility and poverty, as well as the "injustices"
of government policy. What the authors apparently fail to realize is
the possibility that inequality causes poverty and immobility,
not to mention "unjust" government policies perpetuating inequality. In
particular, higher inequality can cause low social mobility by
increasing socioeconomic distances between the highest and lowest rungs
of society, higher rates of poverty by segregating groups and distorting
resource allocations, and inequalityperpetuating government policies
by shifting costs from the wealthy to the general population (through,
for example, cutting funds for widelyavailable public services and
increasing takehome profits from private organizations).
Friday, March 16, 2012
Inequality "Crisis" of Marriage
The Atlantic Monthly posted a fascinating article today on the inequality "crisis" of marriage. My favorite line in the article: "Gone are the days when the
Harvard grad marries the girl with the high school degree simply
because, well, she's pretty."
Thursday, March 15, 2012
Corporate Culture Revisited
Greg Smith has a popular post in the NYT titled Why I am Leaving Goldman Sachs. His reason is that the organizational culture is now "as toxic and destructive" as has "ever seen it." In particular, Smith criticizes that the values and norms of the organization are oriented almost exclusively toward profitmaking, with little or no regard for the wellbeing of other organizations and people, including their clients.
Wednesday, March 14, 2012
Misc. Links
 MIT students are having a Pi Day recitation and celebration today (since today is 3.14, of course).
 The Financial Times discusses Goldman Sachs' corporate culture without, unfortunately, describing what is meant by the phrase; however, I'm glad to see that cultural factors are mentioned, since clearly faulty beliefs, norms, and values contributed to financial crisis.
 The U.S. Census Bureau recently released a report describing the inequality levels (expressed as Gini coefficients) of all counties in the United States from 2006 to 2010; the findings show, as one would expect, that more populous counties are more unequal.
 Finally, a new study suggests that firstgeneration immigrants face a disadvantage in attending college due a "cultural mismatch" in values and norms from between workingclass youth and those from middle and upperclass backgrounds.
Tuesday, March 13, 2012
MIT Inequality Talk
Scatter Plot Matrix in R
Stata has a large number of graphics capabilities (and I highly recommend Stata over other statistical packages for a variety of reasons), but in a few instances R is more useful. In particular, I find R useful for creating beautiful scatter plot matrices and 3D graphical displays. To my knowledge, currently these kinds of graphics are very difficult (if not impossible) to create in Stata 12. What I like about scatter plot matrices is that can have a high datatoink ratio, packing together fitted lines, scattered data, histograms, correlations (proportional to the size of the correlation), and statistical significance "stars" (since reviewers seem to like them). Moreover, I like that all the information effectively puts the "stars" associated with statistical significance in appropriate context: there is an incredible amount of variability in the size of correlations and distribution of data among all the "threestar" correlations, underscoring the limited usefulness of statistical significance as a tool for understanding the social reality given to us by data.
Monday, March 12, 2012
Taxes and Inequality
The economist Daren Acemoglu and his colleague James Robinson have an excellent article on the problems with inequality in the United States. You can find it here. In general, I agree with them entirely, and they are persuasive in outlining the negative aspects of political inequality.
Sunday, March 11, 2012
3D Scatter Plots Redux
One weakness of Stata versus R is the lack of 3D graphing capabilities, in particular 3D scatter plots. However, with some modifications, Stata can indeed provide a suitable substitute for R in most graphical problems, as shown here (I use the infamous auto data set available in Stata with the sysuse command). The main weakness is that the xy and yz
planes do not have grid lines; nevertheless, this graph is another indication
that Stata's graphing capabilities are much stronger than many R users (and perhaps even Stata users)
realize. Here's the graph:
Saturday, March 10, 2012
Checking Weather in Stata
I added a useful Stata command to my computer today: Neal Caren's weathr command in Stata (note that there is no "e"). The command is great: now you can check your day's weather entirely within Stata! The command obtains the current weather conditions and forecast for the next 36 hours from yahoo.com for any zip code in the United States.
Friday, March 09, 2012
Is Everything Culture?
In my readings on culture, I've found a fascinating set of theories called digital physics. These theories posit that the universe fundamentally consists of information (i.e., the "it for bit" doctrine that every particle, atom, quark, and so on is describable as a dichotomous "yes or no" categorization), and thus that the universe is in principle computable. Opponents to digital physics claim that reality is continuous, but the rejoinder is that reality only appears continuous, and is fundamentally categorical (for example, the Planck length suggests that reality is quantized). More relevant to sociology, these perspectives suggest that everything is culture  i.e., information  and thus that societies can be usefully modeled as information systems.
Thursday, March 08, 2012
Ternary (or Triaxial) Plots
One rarelyused graphic is the ternary (or triaxial) plot, which is a very useful way of examining a tripartite decomposition of a variable. For example, the graph in this post displays the composition (which I constructed in Stata using Nicholas J. Cox's commands) of an economy over time. Note that the three percentages add to 100 (or, equivalently, the three proportions add to 1).
It's a bit surprising that this graph appears so infrequently; it would appear to be especially useful for political scientists showing voting fractions over time (with the three most prominent parties for each axis), economists examining the composition of an economy (such as above), or sociologists examining overtime trends in any threepart categorical variable (such as "agree," "disagree," or "neutral" on a question of values or attitudes).
However, note that simply because a graph looks like it's a ternary plot does not make it one! For example, Junk Charts dissects this pseudoternary plot in the New York Times.
It's a bit surprising that this graph appears so infrequently; it would appear to be especially useful for political scientists showing voting fractions over time (with the three most prominent parties for each axis), economists examining the composition of an economy (such as above), or sociologists examining overtime trends in any threepart categorical variable (such as "agree," "disagree," or "neutral" on a question of values or attitudes).
However, note that simply because a graph looks like it's a ternary plot does not make it one! For example, Junk Charts dissects this pseudoternary plot in the New York Times.
Wednesday, March 07, 2012
Causality and Ethnography
The University of Chicago is hosting a conference on causality and ethnography on March 8th and 9th. Full details are available here. My own view on the relationship between causality and ethnography is that ethnographers should use counterfactuals, and in fact usually do whether or not they are explicit about them. In modern statistics (in particular, the work of Donald Rubin at Harvard,
among others, on the potential outcomes model), the counterfactual model of causaltiy clarifies the conditions
under which any particular data set can be interpreted as causal, and shows that these assumptions are extremely strong. Contra the prevailing view of many economists, even instrumental variables regression, regression discontinuity design, and related methods require exceptionally (and often implausibly) strong assumptions for causal interpretation.
Tuesday, March 06, 2012
The Mystery of PowerLaw Distributions
One criticism of sociology, and the macro social sciences more generally (such as political science, anthropology, and economics), is that there are very few "laws" of social reality. There are, however, some sociological regularities that are as yet not fully explained, and which seem bizarre. The most enduring and puzzling of these are powerlaw distributions (a wellknown special case of this is "Zipf's Law"), which is the fact that "large" instances of things are extremely rare, while "small" occurrences of things are extremely common (where size can refer to frequency in a population, population size, geographic space, and so on). In practice this means that a handful of words are much more frequent than other words (and most words are rarely used), wealth is concentrated in a small number of people (and most people are poor), there are a handful of really popular songs (and a vast number of unpopular tunes), and so on. Even the sizes of sand particles on a beach follow a powerlaw distribution: how often have you seen a boulder on a beach?
What might explain the ubiquity of powerlaw distributions? As far as I can tell, nobody is entirely sure, although we have some good guesses. For example, the sociologist Herbert Simon outlined a theory of preferential growth attachment (also known as the "rich get richer" effect), in which songs that are already fairly popular will become more popular, cities that are already large will become even larger, and words already used widely will become even more widely used. Note that this explanation hinges on a positive feedback effect: the probability that any thing gets "larger" is directly proportional to the current "largeness" of the thing; or, to put it another way, large values get amplified rather than cancelled out (as in a normal distribution).
Powerlaw distributions have important cultural, statistical, and political implications.
Culturally, there are several implications. First, most cultural constructs are rarely used and only a handful are common among any group of people. To put it another way, the shared part of culture is likely to be relatively small, while the particular part of culture is vast. Second, frequently used cultural constructs are particularly stable over time; that is, 500 years from the word "the" will still be used, while "sesquipedalian" has a more uncertain future. Third, the stability of a cultural system is derived from the more frequently used cultural constructs, while the dyanmism is among the less frequently used constructs. Fourth, initial conditions are extremely important for the frequency and hence durability of cultural constructs: for instance, small, random fluctuations led to the popularity of "the" in the English language. Finally, following from the previous point, the consequences of initial conditions are highly unpredictable; given small initial changes English speakers today might instead be using the word "tha" or "se" instead of "the."
Statistically, the presence of powerlaw distributions is a reminder that classical linear regression (based on the normal distribution) is not always the appropriate fit to a scatter plot of two variables, and that summarizing a distribution as a mean or median can be highly misleading.
Politically, powerlaw distributions have a unique implication for efforts to deal with wealth inequality: one effective way to alter the distribution of wealth is to remove the positive feedback effects from wealth. The desired distribution of wealth would thus be described by a normal rather than power law function. Importantly, removing the positive feedback effects of wealth would not lead to the removal of inequality, but rather a change in the distribution so that the mean, median, and mode are the same. From this perspective, policies should be in place so that (in principle) a person's change in wealth is independent of their current level of wealth. Such policies might include very high taxes on capital gains, restrictions on the influence of wealth in political decisionmaking, rules specifying equal monetary amounts from promotions for all occupational levels in a firm, and so on.
Powerlaw distributions have important cultural, statistical, and political implications.
Culturally, there are several implications. First, most cultural constructs are rarely used and only a handful are common among any group of people. To put it another way, the shared part of culture is likely to be relatively small, while the particular part of culture is vast. Second, frequently used cultural constructs are particularly stable over time; that is, 500 years from the word "the" will still be used, while "sesquipedalian" has a more uncertain future. Third, the stability of a cultural system is derived from the more frequently used cultural constructs, while the dyanmism is among the less frequently used constructs. Fourth, initial conditions are extremely important for the frequency and hence durability of cultural constructs: for instance, small, random fluctuations led to the popularity of "the" in the English language. Finally, following from the previous point, the consequences of initial conditions are highly unpredictable; given small initial changes English speakers today might instead be using the word "tha" or "se" instead of "the."
Statistically, the presence of powerlaw distributions is a reminder that classical linear regression (based on the normal distribution) is not always the appropriate fit to a scatter plot of two variables, and that summarizing a distribution as a mean or median can be highly misleading.
Politically, powerlaw distributions have a unique implication for efforts to deal with wealth inequality: one effective way to alter the distribution of wealth is to remove the positive feedback effects from wealth. The desired distribution of wealth would thus be described by a normal rather than power law function. Importantly, removing the positive feedback effects of wealth would not lead to the removal of inequality, but rather a change in the distribution so that the mean, median, and mode are the same. From this perspective, policies should be in place so that (in principle) a person's change in wealth is independent of their current level of wealth. Such policies might include very high taxes on capital gains, restrictions on the influence of wealth in political decisionmaking, rules specifying equal monetary amounts from promotions for all occupational levels in a firm, and so on.
Monday, March 05, 2012
Visualizing a Correlation Table
Correlation tables are ubiquitous in social science research, but very rarely they are visualized. As I've emphasized in previous posts, I'm a strong advocate for visualizing data and models whenever possible. For example, for my research I graphed correlations using Adrian Mander's plotmatrix command in Stata. Using Mander's package, I could create a graph that clearly shows all the information in a parsimonious way; moreover, unlike a correlation table, correlation patterns are intuitively grasped from the shading of the cells, and there is an implicit emphasis on the correlation size rather than statistical significance.
Sunday, March 04, 2012
Why Models are Not Data
In doing research, sometimes it can be easy to think that the models
one is using are in fact the data  but this is clearly not true. Even
the mean of a sample of data is a model of the central tendency of the
data, and not the data itself. One clear example of why models are not
data is Anscombe's quartet. For example, take the following:
What is remarkable about this quartet is that for all of these scatter plots the mean of x is the same (exactly), the variance of x is the same (exactly), the mean of y is the same (to two decimal places), the variance of y is the same (to three decimal places), the correlation between x and y is the same (to three decimal places), and the linear regression equation is the same (to two or three decimal places). In other words, the models of the data (e.g., mean, variance, correlation, etc.) are the same, but the data are not!
So what's the solution? As I've mentioned in previous posts, graphing the data is crucial, because we're forced to confront the actual data, and not models of the data.
What is remarkable about this quartet is that for all of these scatter plots the mean of x is the same (exactly), the variance of x is the same (exactly), the mean of y is the same (to two decimal places), the variance of y is the same (to three decimal places), the correlation between x and y is the same (to three decimal places), and the linear regression equation is the same (to two or three decimal places). In other words, the models of the data (e.g., mean, variance, correlation, etc.) are the same, but the data are not!
So what's the solution? As I've mentioned in previous posts, graphing the data is crucial, because we're forced to confront the actual data, and not models of the data.
Saturday, March 03, 2012
R versus Stata Redux
I've used both R and Stata for a long time, but these days I use Stata much more frequently than R. While R is useful for some kinds of graphics (especially threedimensional graphics) and some statistical procedures (for example, finite mixture models), in general I prefer Stata as the goto statistical program. The reasons are clear: Stata has superior help files for almost all ado files, Stata graphics are excellent (even contour plots are available in Stata), cleaning data is a breeze in Stata but awkward in R, labeling data is much efficient in Stata (in fact, as far as I can tell R does not allow for labeling variable names, while Stata allows for labeling levels of a variable, the variable itself, and the data set), and for many procedures Stata's syntax is much more parsimonious than R's.
Yet, R is worth learning because the 3D graphics available are often extremely useful for exploring the data, and there will certainly be cases in which R will have statistical procedures that are unavailable or cumbersome in Stata (Bayesian analyses and finite mixture models come to mind, for example).
Yet, R is worth learning because the 3D graphics available are often extremely useful for exploring the data, and there will certainly be cases in which R will have statistical procedures that are unavailable or cumbersome in Stata (Bayesian analyses and finite mixture models come to mind, for example).
Friday, March 02, 2012
Culture and Poverty
The New York Times has an article covering the concept of the culture of poverty here. The article is fairly accurate, and does a good job highlighting that the study of culture and poverty had its origins in leftwing Marxists (although I would have mentioned Bowles and Gintis, who emphasized that cultural values and norms of obedience to capitalist ideologies rather than intelligence contribute to the social reproduction of inequality). The author elides the fact that the problem with the concept of the "culture of poverty" is that such a thing does not, and never has, existed: culture is everywhere, not just among the a subset of the economically disadvantaged. The appropriate question, then, is: given that we know that culture is a constituent part of the human experience, how does it matter not just for poverty, but for happiness, wellbeing, inequality, wealth, and so on?
Thursday, March 01, 2012
Values and Politics
I'm a bit biased, but the front page of the Huffington Post highlighted a fascinating study on education, culture and politics today.
Subscribe to:
Posts (Atom)
Blog Archive

▼
2012
(56)

▼
March
(29)
 Physics Envy
 Irving Louis Horowitz
 MyPersonality
 Why are Economists so (Consistently) Led Astray Ab...
 Popularity of Programming Languages
 Big Science and Sociology
 Statistical Lexicon
 McKinsey on Big Data
 Inequality: Everyone's Thinking About It
 Universal Limits in HighDimensional Statistics
 Rethinking Tragedy and Success
 Why Inequality Matters
 Inequality "Crisis" of Marriage
 Corporate Culture Revisited
 Misc. Links
 MIT Inequality Talk
 Scatter Plot Matrix in R
 Taxes and Inequality
 3D Scatter Plots Redux
 Checking Weather in Stata
 Is Everything Culture?
 Ternary (or Triaxial) Plots
 Causality and Ethnography
 The Mystery of PowerLaw Distributions
 Visualizing a Correlation Table
 Why Models are Not Data
 R versus Stata Redux
 Culture and Poverty
 Values and Politics

▼
March
(29)