On Time

Adams Square Station, looking southerly
“Adams Square Station, looking southerly,” taken Sept. 10th, 1898.

Problem to Solve

Built to help people stay on time while moving around the city, the first subway tunnels in the United States are still in use today under Boston Common (not too far from Harvard itself!). Over 100 years later, the MBTA—the Massachusetts Bay Transportation Authority—manages public transportation around Boston, still ensuring everyone can stay on time, whether by subway, bus, railroad, or ferry.

In ontime.R, in a folder called ontime, write a program to tell users how likely they are to be on time while taking a particular MBTA route.

Demo

Distribution Code

For this problem, you’ll need to download ontime.R and two .csv files: bus.csv and rail.csv.

Download the distribution code

Open RStudio per the linked steps and navigate to the R console:

>

Next execute

getwd()

to print your working directory. Ensure your current working directory is where you’d like to download this problem’s distribution code. If using RStudio through cs50.dev the recommended directory is /workspaces/NUMBER where NUMBER is a number unique to your codespace.

If you do not see the right working directory, use setwd to change it! Try typing setwd("..") if in the working directory of another problem, which will move you one directory higher.

Next execute

download.file("https://cdn.cs50.net/r/2024/x/psets/2/ontime.zip", "ontime.zip")

in order to download a ZIP called ontime.zip into your codespace.

Then execute

unzip("ontime.zip")

to create a folder called ontime. You no longer need the ZIP file, so you can execute

file.remove("ontime.zip")

Now type

setwd("ontime")

followed by Enter to move yourself into (i.e., open) that directory. Your working directory should now end with

ontime/

If all was successful, you should execute

list.files()

and see bus.csv, ontime.R, and rail.csv. If not, retrace your steps and see if you can determine where you went wrong!

Schema

Before jumping in, it will be helpful to get a sense for the “schema” (i.e., organization!) of the data you’re given.

Learn about this data

In this problem, you are given two .csv files: bus.csv and rail.csv. Each file contains data directly from the MBTA’s Open Data Portal.

Each row in each CSV reflects an observation of the reliability of an MBTA service, known as “service reliability” by the MBTA. Let’s approach this idea in two parts:

  • Service: consider a “service” to be a mode of transportation, such as a bus or subway, traveling a particular route on a particular day. For example, on most days, the MBTA operates the Red Line subway service, which goes from Alewife to Braintree.
  • Reliability: consider “reliability” to be how often a service is on time. The MBTA calculates reliability quite simply:

    \[\text{reliability} = \frac{\text{numerator}}{\text{denominator}}\]

    That is, for each row, dividing the value in the numerator column by the value in the denominator column returns the percentage of time that a particular service was on time.

Let’s explore the rest of the columns, for thoroughness:

  • year, which is the year in which the service took place.
  • month, which is the month in which the service took place.
  • day, which is the day on which the service took place.
  • mode, which is the service’s mode of transportation (e.g., either “Bus” or “Rail”).
  • route, which is the route the service takes.
  • peak, which is whether the service took place during peak (busy) hours or off-peak (less busy) hours. This value is either “PEAK” or “OFF_PEAK.”
  • numerator, which is the numerator value used to calculate service reliability.
  • denominator, which is the denominator value used to calculate service reliability.

While you’re here, like to get a sense of the MBTA’s routes? Try looking at their schedules and maps!

Specification

In ontime.R, use the data provided in bus.csv and rail.csv to write a program that:

  1. Prompts the user to enter a route they intend to take.
  2. Outputs the mean reliability for all services along that route. Output two means: one for peak hours and one for off-peak hours. Express the mean as a percentage, rounded to the nearest whole percentage point.
    • For instance, if a user enters “Blue,” find the mean reliability—for both peak and off-peak hours—among all rows with “Blue” as the listed route. Express the result as a percentage, rounded to the nearest whole percentage point.
  3. Tells the user to enter a valid route if they enter an invalid one.

If you haven’t already, learn more about the schema of this data before starting!

Advice

Consider the below as advice to help you on your way:

Use the %in% operator to validate a user's input

The logical operator %in% returns whether a given value is in a vector—TRUE or FALSE. For instance, consider the following:

"Red" %in% services$route

which would return

TRUE

if “Red” is among the values in the route column of the services data frame.

Usage

Assuming ontime.R is in your working directory, enter the below in the R console to test your program:

source("ontime.R")

How to Test

Here’s how to test your code manually:

  • Run your program with source("ontime.R"). Type “Blue.” Your program should output that the Blue line is on time 93% of the time during peak hours and 92% of the time during off-peak hours.
  • Run your program with source("ontime.R"). Type “86.” Your program should output that the 86 bus route is on time 72% of the time during peak hours and 65% of the time during off-peak hours.
  • Run your program with source("ontime.R"). Type “Purple.” Your program should let the user know that this is not a valid route.

check50

You can also check your code using check50, a program that CS50 will use to test your code when you submit. But be sure to test it yourself as well!

Run the following command in the RStudio console:

check50("cs50/problems/2024/r/ontime")

Green smilies mean your program has passed a test! Red frownies will indicate your program output something unexpected. Visit the URL that check50 outputs to see the input check50 handed to your program, what output it expected, and what output your program actually gave.

How to Submit

You can submit your code using submit50.

Keeping in mind the course’s policy on academic honesty, run the following command in the RStudio console:

submit50("cs50/problems/2024/r/ontime")

Acknowledgements

Data adapted from mbta-massdot.opendata.arcgis.com/datasets/b3a24561c2104422a78b593e92b566d5_0/explore.