With that in mind, here is a scatterplot of various words according to their PMI scores for both “geek” and “nerd” on different axes (ignoring words with negative PMI, and treating #hashtags as distinct): The PMI statistic measures a kind of correlation: a positive PMI score for two words means they ”keep great company,” a negative score means they tend to keep their distance, and a score close to zero means they bump into each other more or less at random. For instance, if we let v = “geek,” we compute the log-probability of a word w in the “geek” search corpus, and subtract the log-probability of w in the background corpus. Where in this case is the probability of the word(s) in question appearing in a random tweet, as estimated from the data. It’s commonly used in the information retrieval literature to measure the cooccurrence of words and phrases in text, and it also turns out to be a good predictor of how humans evaluate semantic word similarity (Recchia & Jones, 2009) and topic model quality (Newman & al., 2010).įor two words w and v, the PMI is given by:
It’s been a busy year!Ī great little statistic for measuring how much company two words tend to keep is pointwise mutual information (PMI). Yes, yes, yes… I collected all the data six months ago but just now got around to crunching the numbers. I also sampled tweets via the search API matching the query terms “geek” and “nerd” during the same time period (38.8k and 30.6k total, respectively). This includes a background corpus of 2.6 million tweets via the streaming API from between December 6, 2012, and January 3, 2013. I analyzed two sources of Twitter data, since it’s readily available and pretty geeky/nerdy to boot.
It’s not too bad… or you can probably just skip to the “Results” subsection below…) (Note: If you’re neither a geek nor a nerd, don’t be scared by the math. To characterize the similarities and differences between “geek” and “nerd,” maybe we can find the other words that tend to keep them company, and see if these linguistic companions support my point of view? Data and Method “You shall know a word by the company it keeps” ~ J.R. An Experimentĭo I have any evidence for this contrast? (By the way, this viewpoint dates back to a grad-school conversation with fellow geek/nerd Bryan Barnes, now a physicist at NIST.) The Wiktionary entries for “ geek” and “ nerd“ lend some credence to my position, but I’d like something a bit more empirical… Note that, while not synonyms, they are not necessarily distinct either: many geeks are also nerds (and vice versa). A computer geek might read Wired and tap the Silicon Valley rumor-mill for leads on the next hot-new-thing, while a computer nerd might read CLRS and keep an eye out for clever new ways of applying Dijkstra’s algorithm. The distinction is that geeks are fans of their subjects, and nerds are practitioners of them. Or, to put it pictorially à la The Simpsons:īoth are dedicated to their subjects, and sometimes socially awkward. Nerds are “achievement” oriented, and focus their efforts on acquiring knowledge and skill over trivia and memorabilia. nerd – A studious intellectual, although again of a particular topic or field.They are obsessed with the newest, coolest, trendiest things that their subject has to offer.
Geeks are “collection” oriented, gathering facts and mementos related to their subject of interest. geek – An enthusiast of a particular topic or field.In my mind, “geek” and “nerd” are related, but capture different dimensions of an intense dedication to a subject: (Furthermore, “sports nerd” either doesn’t compute or means something else.) If “geek” and “nerd” are synonyms, then “sports geek” might be an oxymoron. Consider the phrase “sports geek” - an occasional substitute for “jock” and perhaps the arch-rival of a “nerd” in high-school folklore. To many people, “geek” and “nerd” are synonyms, but in fact they are a little different.