Visualization Workshop Hackathon Challenge

Announcing the Hackathon Challenge for the Visual Text Analytics and Social Science Workshop, to be held at Imperial College London and the London School of Economics, 24-25 March 2015. Rules Download the contest corpus, which consists of the US Presidential candidate debates from the current election. These are available as: a zip file of plain texts here a quanteda corpus object, which can be loaded in R using load(url(“”)) This … Continue Reading

Text analysis of the 10th Republican Presidential candidate debate using R and the quanteda package

On 25 February 2016, the tenth debate among the Republican candidates for the 2016 Presidential election took place in Houston, Texas, moderated by CNN. In this demonstration of the quanteda package, I will show how to download, import, clean, parse by speaker, and analyze the debate by speaker. The first step involves loading the debate into R. I got the debate from the New York Times publication of the transcript. … Continue Reading

What text analysis software is available for Stata?

A lot of text analysis packages exist for R, such as quanteda, tm, qdap, and korPus. But these are only useful if you are proficient in R programming. What about users of alternative statistical packages, such as Stata? Turns out that recent versions of Stata have made huge strides in this area. As of Stata version 13, Stata introduced a new data type of “long string” – strL – that can be of … Continue Reading

Encoding headaches, emoticons, and R’s handling of UTF-8/16

I was recently asked for help from a colleague (@kmmunger) who was experiencing a choke on cleaning the tokenized texts from Twitter data. The tweets were in the JSON format that comes from the Twitter API, in what we thought was UTF-8 encoding. Turns out these tweets used some emoticons from the nosebleed section of the Unicode maps, and these were not being read properly into R, as quanteda was being used … Continue Reading

How to install the R package topicmodels on OS X

Many people have reported problems when attempting to install the R package topicmodels on R when using OS X Mavericks or Yosemite. The problem is that the binaries are not yet built for these versions of OS X, and you need additional software installed in order to build the source. Once you have built the package from source, however, it seems to work fine. Here is the solution that worked for … Continue Reading