On Time

Adams Square
“Adams Square Station, looking southerly,” taken on Sept. 10, 1898.

Problem to Solve

The first subway tunnels in the United States, built to help people stay on time while moving around the city, are still in use today under Boston Common (not too far from Harvard itself!). Over 100 years later, the MBTA—the Massachusetts Bay Transportation Authority—manages public transportation around Boston, still ensuring everyone can stay on time, whether by subway, bus, railroad, or ferry.

The MBTA has asked you to clean up a data set it will use to calculate its service reliability, putting the data in a SQLite database for easy use and maintenance.

Distribution Code

Download the distribution code

Log into cs50.dev, click on your terminal window, and execute cd by itself. You should find that your terminal window’s prompt resembles the below:

$

Next execute

wget https://cdn.cs50.net/sql/2025/spring/sections/3/ontime.zip

in order to download a ZIP called ontime.zip into your codespace.

Then execute

unzip ontime.zip

to create a folder called ontime. You no longer need the ZIP file, so you can execute

rm ontime.zip

and respond with “y” followed by Enter at the prompt to remove the ZIP file you downloaded.

Now type

cd ontime

followed by Enter to move yourself into (i.e., open) that directory. Your prompt should now resemble the below.

ontime/ $

If all was successful, you should execute

ls

and see a database named ontime.db alongside a rail.csv file.

Specification

Write a series of SQL (and SQLite) statements to import and clean the data from rail.csv into a table, rail, in a database called ontime.db.

Within ontime.db, the rail table is already created for you.

To consider the data in the rail table clean, you should ensure…

  1. The data includes only those rows related to “PEAK” transit times.
  2. All green line trains (e.g., “Green-D”, “Green-E”, etc.) are listed simply as “Green.”
  3. Each row has a unique ID.

Advice

Begin by importing rail.csv into a temporary table

Start by getting all of the data from rail.csv into a temporary table, one called rail_temp. This table was created for you already in ontime.db.

A temporary table is a helpful placeholder: you can use it to clean your data until it’s in a form that’s suitable for your final rail table.

Recall that .import is a SQLite statement that can import a CSV file into a table of your choice.

Clean the imported data

With your data in a temporary table, continue writing SQL statements to clean the data.

You’ll need to use UPDATE and DELETE statements to ensure the data includes only those rows related to “PEAK” transit times, and that all green line trains (e.g., “Green-D”, “Green-E”, etc.) are listed simply as “Green.”

Transfer the data from your temporary table into the rail table

Recall that you can INSERT values into a new table by SELECTing rows from another:

INSERT INTO "table0" ("column0", "column1")
SELECT "column0", "column1" FROM "table1";

If your table has a primary key column, as rail does, a unique ID should be automatically generated for each row you insert.