Sentiment Analysis (PT-PT) – Introduction

In February I joined a startup project that wanted to apply machine learning concepts to analyze social media content. I joined as a Data Scientist and one of the first challenges I had to face was analyzing sentiment on posts. We realized that this had not been done for PT-PT (at least openly) so we decided to create it ourselves. The whole project was done on PySpark 1.6.1 (we wanted to be scalable early on) and the notebook can be found here. For obvious reasons I will not publicly provide the data used in this notebook. If, however, you wish to extract it yourself and need some help, please let me know and I’ll share some of the code I used to “crawl” twitter.

These series of posts will follow the challenges and hurdles we faced and will be divided in four sections:

  1. Data Extraction, Cleaning and Preparation
  2. Modeling
  3. Evaluation
  4. Final Thoughts & Tips

I hope you find this interesting enough to comment. Feel free to leave any questions or email me directly.

Leave a Reply

Your email address will not be published. Required fields are marked *