Hidden meanings in a torrent of tweets

Twitter word choices that were the most positively correlated with age-adjusted mortality from atherosclerotic heart disease within a county. The size of the word represents its relative prevalence.
Twitter word choices that were the most positively correlated with age-adjusted mortality from atherosclerotic heart disease within a county. The size of the word represents its relative prevalence.

With tens of million of people posting to social media from mobile devices, researchers have gained access to a growing torrent of geotagged utterances. All those tweets, FB updates, and Instagram posts can be strangely revealing.

A city’s aggregate output on Twitter, for example, presents a surprisingly accurate mirror of the psychological and social health of a community. In places where people tweet more negative, angry and disengaged words (shown above) the risk of dying from atherosclerotic heart disease is significantly greater. In places where the language of tweets is more positive and engaged, heart disease mortality is lower.

The maps below show age-adjusted mortality from atherosclerotic heart disease as reported by the CDC and as predicted by Twitter language. (Counties not colored lacked reliable data.)

The maps show age-adjusted mortality from atherosclerotic heart disease as reported by the CDC and as predicted by Twitter language. (Counties not colored lacked reliable data.)
Age-adjusted mortality from atherosclerotic heart disease as reported by the CDC and as predicted by Twitter language. (Counties not colored lacked reliable data.)

Tweets seem to reflect important characteristics of a community’s economic, physical, and psychological environment, the authors say:

At the individual level, psychological variables and heart-disease risk are connected through multiple pathways, including health behaviors, social relationships, situation selection, and physiological reactivity. These pathways occur within a broader social context that directly and indirectly influences an individual’s life experiences. Local communities create physical and social environments that influence the behaviors, stress experiences, and health of their residents. Epidemiological studies have found that the aggregated characteristics of communities, such as social cohesion and social capital, account for a significant portion of variation in health outcomes, independently of individual-level characteristics, such that the combined psychological character of the community is more informative for predicting risk than are the self-reports of any one individual. The language of Twitter may be a window into the aggregated and powerful effects of the community context.

that’s lame af !!

On the linguistic front, social media is making it possible to track the usage decisions of millions of individuals to reveal the dynamics of language change.

From Diffusion of Lexical Change in Social Media, by Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, and Eric P. Xing. PLoS One (2014)

The figures above show how a new abbreviation for ‘as fuck’ spread geographically during the course of 150 weeks after it emerged on Twitter, as in the following tweets:

dudes , if u’re tryin to holla don’t rap on my voicemail that’s a huge turn off !! that’s lame af !!

@jessicathug lol really tho this bitch is fake af ! these bitches frm the i.e is fake i tell u

ugh jus drank lik 2spoons fulls of thiz cough surup nd im sleepy af shit hits u hard blaah nd i have hw to do 😮 >.<

my sister be speakin some foreign language when she first wake up on top of the fact that she act confused af for no reason

this boy lookin at me hard af n he w/a date

I was surprised to see that the racial and ethnic composition of cities can be a stronger predictor of linguistic influence than geographic proximity. The more similar two metropolitan areas are racially, the more likely it is that language spreads between them, according to Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, and Eric P. Xing, who say:

This indicates that while language change does spread geographically, demographics play a central role, and nearby cities may remain linguistically distinct if they differ demographically, particularly in terms of race. Examples of linguistically linked city pairs that are geographically distant but demographically similar include Washington D.C. and New Orleans (high proportions of African-Americans), Los Angeles and Miami (high proportions of Hispanics), and Boston and Seattle (relatively few minorities, compared with other large cities).

Chanclas o chancletas?

Bruno Gonçalves and David Sànchez collected every Spanish-language tweet during a period of more than two years and built a data set to study dialects on a global scale. The delightful map below shows where different words for sandals predominate across North and South America. You can peruse many more dialect maps at Gonçalves’ web site.

A dialect map by Bruno Gonçalves.
A dialect map by Bruno Gonçalves.

According to Gonçalves and Sànchez, global Spanish has diverged into two superdialects: an urban speech used in big cities in the Americas and Spain, and a diverse form used in rural areas and small towns. The latter clusters into smaller varieties with a more localized character. Gonçalves and David Sànchez explain their methods in an open access paper in PLoS One.

[In response to my Tumblr post on these maps, lucas-veg noted that in Chile there are even more Spanish words for sandals: “We, chileans, by ‘zapatillas’ mean ‘sneakers’/’running shoes’. Most people say ‘chalas’, ‘chancletas’, ‘hawaianas’ or ‘condoritos’ when talking about sandals.’

And ethuil on Tumblr added a data point for Cuba: “we do say ‘chancletas'”. Cuba was excluded from the analysis because the data comes from what people said on Twitter. As I understand it, Cuba’s censorship laws make it practically impossible for most Cubanos to use social media with the rest of the world.]

Psychological Language on Twitter Predicts County-Level Heart Disease Mortality, Johannes C. Eichstaedt & others, Psychological Science, Jan 20, 2015]

Diffusion of Lexical Change in Social Media, by Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, and Eric P. Xing. PLoS One (2014)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s