Day61: Words in title

Posted by csiu on April 26, 2017 | with: 100daysofcode,

It’s been a long and full day, but I wanted to recreate the first image from the following blog post by David Robinson using my Kickstarter data .

[In his analysis, he does] a simple analysis, examining what words tend to occur at particular points within a story, including words that characterize the beginning, middle, or end.


Workflow:

  1. Get data & preprocess the text by running this Jupyter Notebook
  2. Further wrangle the data and generate the plot by running this R script

Reflecting back, the median word position might change slightly had I not removed stop words and digits. The purpose of the text preprocessing was to standardize the words so that different variation of words mean the same thing, for instance “messaging” and “messages”. A trade-off.