Lab 7: Songs

Write SQL queries to answer questions about a database of songs.

Getting Started

Open VS Code.

Start by clicking inside your terminal window, then execute cd by itself. You should find that its “prompt” resembles the below.

$

Click inside of that terminal window and then execute

wget https://cdn.cs50.net/2022/fall/labs/7/songs.zip

followed by Enter in order to download a ZIP called songs.zip in your codespace. Take care not to overlook the space between wget and the following URL, or any other character for that matter!

Now execute

unzip songs.zip

to create a folder called songs. You no longer need the ZIP file, so you can execute

rm songs.zip

and respond with “y” followed by Enter at the prompt to remove the ZIP file you downloaded.

Now type

cd songs

followed by Enter to move yourself into (i.e., open) that directory. Your prompt should now resemble the below.

songs/ $

If all was successful, you should execute

ls

and you should see 8 .sql files, songs.db, and answers.txt.

If you run into any trouble, follow these same steps again and see if you can determine where you went wrong!

Understanding

Provided to you is a file called songs.db, a SQLite database that stores data from Spotify about songs and their artists. This dataset contains the top 100 streamed songs on Spotify in 2018. In a terminal window, run sqlite3 songs.db so that you can begin executing queries on the database.

First, when sqlite3 prompts you to provide a query, type .schema and press enter. This will output the CREATE TABLE statements that were used to generate each of the tables in the database. By examining those statements, you can identify the columns present in each table.

Notice that every artist has an id and a name. Notice, too, that every song has a name, an artist_id (corresponding to the id of the artist of the song), as well as values for the danceability, energy, key, loudness, speechiness (presence of spoken words in a track), valence, tempo, and duration of the song (measured in milliseconds).

The challenge ahead of you is to write SQL queries to answer a variety of different questions by selecting data from one or more of these tables. After you do so, you’ll reflect on the ways Spotify might use this same data in their annual Spotify Wrapped campaign to characterize listeners’ habits.

Implementation Details

For each of the following problems, you should write a single SQL query that outputs the results specified by each problem. Your response must take the form of a single SQL query, though you may nest other queries inside of your query. You should not assume anything about the ids of any particular songs or artists: your queries should be accurate even if the id of any particular song or person were different. Finally, each query should return only the data necessary to answer the question: if the problem only asks you to output the names of songs, for example, then your query should not also output each song’s tempo.

  1. In 1.sql, write a SQL query to list the names of all songs in the database.
    • Your query should output a table with a single column for the name of each song.
  2. In 2.sql, write a SQL query to list the names of all songs in increasing order of tempo.
    • Your query should output a table with a single column for the name of each song.
  3. In 3.sql, write a SQL query to list the names of the top 5 longest songs, in descending order of length.
    • Your query should output a table with a single column for the name of each song.
  4. In 4.sql, write a SQL query that lists the names of any songs that have danceability, energy, and valence greater than 0.75.
    • Your query should output a table with a single column for the name of each song.
  5. In 5.sql, write a SQL query that returns the average energy of all the songs.
    • Your query should output a table with a single column and a single row containing the average energy.
  6. In 6.sql, write a SQL query that lists the names of songs that are by Post Malone.
    • Your query should output a table with a single column for the name of each song.
    • You should not make any assumptions about what Post Malone’s artist_id is.
  7. In 7.sql, write a SQL query that returns the average energy of songs that are by Drake.
    • Your query should output a table with a single column and a single row containing the average energy.
    • You should not make any assumptions about what Drake’s artist_id is.
  8. In 8.sql, write a SQL query that lists the names of the songs that feature other artists.
    • Songs that feature other artists will include “feat.” in the name of the song.
    • Your query should output a table with a single column for the name of each song.

Walkthrough

Hints

Not sure how to solve?

Spotify Wrapped

Spotify Wrapped is a feature presenting Spotify users’ 100 most played songs from the past year. In 2021, Spotify Wrapped calculated an “Audio Aura” for each user, a “reading of [their] two most prominent moods as dictated by [their] top songs and artists of the year.” Suppose Spotify determines an audio aura by looking at the average energy, valence, and danceability of a person’s top 100 songs from the past year. In answers.txt, reflect on the following questions:

  • If songs.db contains the top 100 songs of one listener from 2018, how would you characterize their audio aura?
  • Hypothesize about why the way you’ve calculated this aura might not be very representative of the listener. What better ways of calculating this aura would you propose?

Be sure to submit answers.txt along with each of your .sql files!

Testing

Execute the below to evaluate the correctness of your code using check50.

check50 cs50/labs/2022/fall/songs

How to Submit

  1. Download each of your 8 .sql files (named 1.sql, 2.sql, …, 8.sql) and answers.txt by control-clicking or right-clicking on the files in your codespace’s file browser and choosing Download.
  2. Go to CS50’s Gradescope page.
  3. Click “Lab 7: Songs”.
  4. Drag and drop your .sql files and answers.txt to the area that says “Drag & Drop”. Be sure that each file is correctly named and that you submit all files in one bundle.
  5. Click “Upload”.

You should see a message that says “Lab 7: Songs submitted successfully!”

Want to see the staff's solution?
-- 1.sql
-- All songs in the database.
SELECT name
FROM songs;

-- 2.sql
-- All songs in increasing order of tempo.
SELECT name
FROM songs
ORDER BY tempo;

-- 3.sql
-- The names of the top 5 longest songs, in descending order of length.
SELECT name
FROM songs
ORDER BY duration_ms
DESC LIMIT 5;

-- 4.sql
-- The names of any songs that have danceability, energy, and valence greater than 0.75.
SELECT name
FROM songs
WHERE danceability > 0.75 AND energy > 0.75 AND valence > 0.75;

-- 5.sql
-- The average energy of all the songs.
SELECT AVG(energy)
FROM songs

-- 6.sql
-- The names of songs that are by Post Malone.
-- Using nested SELECTs
SELECT name
FROM songs
WHERE artist_id =
(
    SELECT id
    FROM artists
    WHERE name = 'Post Malone'
);

-- Using JOIN
SELECT songs.name
FROM songs
JOIN artists ON songs.artist_id = artists.id
WHERE artists.name = 'Post Malone';

-- 7.sql
-- The average energy of songs that are by Drake
-- Using nested SELECTs
SELECT AVG(energy)
FROM songs
WHERE artist_id =
(
    SELECT id
    FROM artists
    WHERE name = 'Drake'
);

-- Using JOIN
SELECT AVG(songs.energy)
FROM songs
JOIN artists ON songs.artist_id = artists.id
WHERE artists.name = 'Drake';

-- 8.sql
-- The names of songs that feature other artists.
SELECT name
FROM songs
WHERE name LIKE '%feat.%';

Acknowledgements

Dataset from Kaggle.