Rap Over Time: A Textual Data Analysis

Rennah Weng
6 min readApr 19, 2021

--

Written By: Amy Huang, Lindsey Weiskopf, Matthew Ottomano, and Rennah Weng

Introduction

Our project explores the relationship between time, words, and hip-hop. We chose rap as an axis for analysis because of its prolific prose, its reactionary nature to the artist’s current state of the world, and the rapid changes in the rap industry over the past 4 decades. Our investigation was guided by research and the contextualization of the history of hip-hop, capturing time as authentically as possible. With the context to understand our data, the viewer can explore each moment through the words of rap.

Related Work & Inspirations

  • This ThePudding project analyzes unique word counts by hip-hop artists. Extending this, we decided to explore rap artists’ dictions across different eras.
  • This FiveThirtyEight project shows the relation between hip-hop and politics, inspiring us to parse semantic themes in rap.
  • Hip-Hop Word Count shows what rap lyrics have captured about rap culture over time.

Methodology

Our methodology can be defined by three stages:

1. Research

  • We conducted preliminary academic research in order to gain a better understanding of hip-hop culture. This scholarship guided our data exploration.
  • From our research and review of existing textual analysis, we posed the research question, “How has rap changed over time?” We referenced the book, RAP: A Juxtaposition of the Eras, by V.L Collins Jr, to split our data into four eras: Foundation, Golden, Millennial, and Zoomer.

2. Data Collection

  • We found the 10 most popular artists of each era and their most popular album. Then, we used Genius to collect rap lyrics of every song in each artist’s album.
  • We used Python libraries to tokenize the lyrics, remove stop words, calculate word frequency, and compiled our data into CSV files, which were later converted into JSON files. Finally, we used SQL to rank words on total word counts and then split the count internally by era.

3. Understanding Data

  • We conducted Exploratory Data Analysis on our dataset of rap lyrics across four eras. We used a semantic parser and word association model that we trained using rap lyrics and applied the trained model to our dataset for our semantic analysis.

Design

Our design was primarily guided by the goal of contextualized exploration. We wanted to first present the viewer with background information presented in a simple and fun way. We also wanted to keep our color schemes and consistent throughout our visualizations to emphasize how we chose to separate our data. We initially attempted a more narrative visualization, but we decided that exploratory visualizations better fit our goals while omitting certain biases from our presentation of data.

Implementation

We started our visualizations by looking through the Observable Gallery to gain inspiration. After collecting, cleaning, and exploring our data, we adapted the searchable and zoomable treemap, and the chord diagram templates to represent our data. We created the timeline from scratch.

For the Chord Diagram, we used the python library, Gensim, to train a model on rap songs throughout the eras. We then used this model to get semantically similar words to themes researched such as love, god, and family. Finally, we scanned our data for these words and summed their occurrences into categories like love, god, family, etc. When we were using this data in our chord diagram, Observable was mistaking a really long processing process to an infinite loop due to poor latency. To solve this, we normalized the weights in our matrices.

Discussion

After several iterations of our final visualizations, we’re satisfied with our encodings and the progress that we have made throughout this process. Audiences loved our searchable treemap that allows them to explore interesting words like “money” and “police” and see their distribution across eras. However, they prefer to have search suggestions to quickly explore words that we’ve found interesting. Audiences experienced confusion in the chord diagram and were curious about the decision for the various themes of the chord diagram. They asked for further documentation and for the inclusion of additional themes. We want to emphasize that our main goal is to present exploratory visualizations that lead the readers to explore various findings in rap lyrics and link them with historical context to make their own interpretations!

Future Work

An interesting visualization to look at is a sentence tree, in which we match certain words to where they appear in sentences to provide more context.

In addition, we recognize that there are different sub-cultures within hip-hop and several more influential artists. We tried to include a mix of the different artists, however, being able to isolate different sub-genres and analyze them by word and semantics can provide useful insights into the differences within hip-hop.

Acknowledgments

THANK YOU to Professor Chang, Aviva, and Rounak for listening to our concerns and helping us find solutions.

We are grateful for our classmate’s feedback during our in-progress critique, whose comments propelled us forward to a more thoughtful design process.

References

Thank you for reading!

--

--

Rennah Weng
Rennah Weng

No responses yet