Day03: APIs and sentiment analysis

Posted by csiu on February 27, 2017 | with: 100daysofcode, Text mining, Machine Learning

My defense is tomorrow and so I wanted to do something simple. Something that will not take a lot of time. I started with getting data from APIs, but somehow ended up working on sentiment analysis…

The Jupyter Notebook for this little project is found here.

Connecting to Reddit

I wanted to access Reddit data. One way to access data is through APIs.

APIs stands for application programming interfaces and they allow third party developers to access and design products that are powered by a company’s service.

Searching online, the praw package (which stands for The Python Reddit API Wrapper) looks to do what I want. Additionally, if there were no API wrappers available, access can be done through the API directly as is demonstrated in How to use Reddit API in Python.

Data mining for a data set

Now that we connected to Reddit, I need a target. The AskReddit subreddit is always interesting and full of opinions.

Men of Reddit, what’s the biggest “I’m a princess” red flag? (post submitted by thewhackcat, Feb 2017)

I found my target (and made this data available on Gist).

Sentiment analysis

Now that I have a collection of words, I need to do something with this data. Originally I wanted to count the number a times a word appears, but I wanted to know how people feel more.

I followed the “Sentiment Analysis” posting to train a Naive Bayes Classifier to classify the sentiment of a sentence and found that most sentences in this topic are neutral at the time of analysis (x-axis is score).

The top 5 most negative (uncensored) comments were:

1. I was playing as the high chief of Novgorod today (starts as holomomor or something) and I had to kill that motherfucker who owns the only port at the start of the game and holy shit that guy survived more assassination attempts than I have ever seen.

2. "You stupid lemon stealing whore"

3. Knew a girl who sucked dick behind Macy's, was a total princess and got pissed off when everyone found out she was more trashy than her Facebook statuses had suggested

4. Relationships get fucking adversarial and antagonistic so often and that's total cancer for a relationship.

5. Finally just said, "Fuck it," and imprisoned his ass and executed him.

And the top 5 most positive comments were:

1. I still do have more male friends than female, but friends are friends, so who cares?

2. She was fun, had a great sense of humor and she was absolutely gorgeous.

3. "Father" is the person who provided half your DNA, "dad" is the person who lovingly raised you to the best of his ability.

4. One of my best friends had a girlfriend once say she likes to have nicer things than other people, but doesn't want to work so he needs to have a good job.

5. Turns out, she was an actual direct descendant and princess of Mowri royalty in hawai'i and meant she was a princess in that she always acted with grace, charm and dignity because she represented her family, which was greater than just her.

It’s good to end on a positive note.