Tracking Discussion on the r/NBA Subreddit
This is a copy of a post I made on r/NBA that received over 100K views, 300 upvotes and over 100 comments.
At the end of this summer I made a site, nbamentions.com, that tracks the number of times players and teams are mentioned in the r/NBA subreddit. It also shows which r/NBA users have been posting the most. It's been collecting data for 4 months (since Aug 2) and is now ready to be shared. Below I'll show the results, interesting trends that have emerged, and some insight to how I built it along with the challenges I faced.
My strategy up until this point was to just view each file in my IDE, go to where warnings were highlighted, and fix them.
Most Mentioned Players
Lebron absolutely dominates the discussions here. He's mentioned significantly more than the next closest player or team. Unsurprisingly, a player being involved with media controversy greatly increases their mentions. For example, half of Kyrie's mentions in the past 4 months came in the 2 week period from Oct 29 - Nov 11. He went from averaging ~200 mentions a day to over 2,000 mentions a day in that period. Similarly, Draymond Green and Jordan Poole went from averaging a combined 225 mentions a day to over 2,400 mentions a day in the 10 days following the punch.
Most Mentioned Teams
- Los Angeles Lakers - 49,600 mentions
- Golden State Warriors - 41,500 mentions
- Brooklyn Nets - 36,100 mentions
There's not much interesting to say here except that how much a team is mentioned correlates much more with their market size than how good they are ;).
Most Active Users
In total over the last 4 months, r/NBA has generated over 900,000 comments written by over 95,000 users, for an average of 9.5 comments per user. That's a lot! However, the discussion is dominated by the most frequent posters, who average over 19 comments per day. In particular, u/RubbleWestbrick is by far the most prolific poster, averaging over 50 comments per day!
Other Features
There's a number of other things nbamentions.com can do. You can filter by time frame to see who is trending in the past day, week, month etc. A flame emoji beside a player or team's name indicates that they are being mentioned more than usual in the past few hours. If you click on a player or team's name, you can see what user has commented about them the most, as well as a list of all the comments that have mentioned them. Likewise, clicking on a user's name will show their comments along with what player and team they comment about the most.
How It's Built
I'm using the Python Reddit API Wrapper (PRAW) to scrape r/NBA for comments that contain a player or team's name. Each of these comments is stored in a SQL database so that it can be queried by the site. The site itself is an Angular app with a Flask API for the backend.