NLP / Text Mining

Public Sentiment NLP – Reputational Risk

Public sentiment analysis on the X (Twitter) platform regarding the release of the Epstein documents. Data was collected using Tweet-Harvest (Node.js), stored in Google BigQuery, processed through a multi-stage NLP pipeline, and analyzed using VADER.

Detailed Insights

Data Collection & Pipeline

Scraping tweets with 'epstain/epstein' keywords in Indonesian. Data cleaned (Regex), tokenized, and lemmatized (Sastrawi). Sentiment labeled via VADER compound score.

Sentiment Findings

Negative sentiment dominates at ~49.6%. 'war' (176) indicates dominant geopolitical narratives. 'child' (92) reflects public anger. This indicates a highly emotionally charged discourse.

Recommendations

Increase reporting transparency to build public trust. Narratives need to be more balanced between legal facts and geopolitical context.

Tech Stack

VADERNLTKSastrawiBigQueryTweet-Harvest

Key Results

2,262 tweets analyzed
49.6% Negative, 25.6% Positive
Top words: evidence (318), war (197)

View Full Project on GitHub