Exploratory Data Analysis
Problem to Solve
Exploratory data analysis (EDA) is a method used by data scientists to find interesting characteristics of data and test hypotheses. It’s often one of the initial steps a researcher might take in a larger data analysis program.
You now have many of the tools you need to perform your own exploratory data analysis! In a program called eda.R
, in a folder called eda
, write a program to visualize and explore a data set of your choice.
Getting Started
For this problem, you’ll need to create eda.R
in a folder called eda
.
Create eda.R
Open RStudio per the linked steps and navigate to the R console:
>
Next execute
getwd()
to print your working directory. Ensure your current working directory is where you’d like to create this problem’s folder. If using RStudio through cs50.dev the recommended directory is /workspaces/NUMBER
where NUMBER
is a number unique to your codespace.
If you do not see the right working directory, use setwd
to change it! Try typing setwd("..")
if in the working directory of another problem, which will move you one directory higher.
Next execute
dir.create("eda")
in order to create a folder called eda
in your codespace.
Now type
setwd("eda")
followed by Enter to move yourself into (i.e., open) that directory. Your working directory should now end with
eda/
Finally, type
file.create("eda.R")
to create a file called eda.R
inside of the eda
folder.
If all was successful, you should execute
list.files()
and see eda.R
. If not, retrace your steps and see if you can determine where you went wrong!
Specification
The only requirement for this problem is that you produce a visualization you care about, that is interesting to you, and that you feel proud of. Oh, and that you save the visualization in a file called visualization.png
!
You might find it helpful to get a bit of inspiration:
Usage
Assuming eda.R
is in your working directory, enter the below in the R console to test your program:
source("eda.R")
How to Test
Afraid there isn’t a specific way to test your code, other than by trial and error to produce the visualization you’re looking for!
check50
You can also check your code using check50
, a program that CS50 will use to test your code when you submit. But be sure to test it yourself as well!
Run the following command in the RStudio console:
check50("cs50/problems/2024/r/eda")
Green smilies mean your program has passed a test! Red frownies will indicate your program output something unexpected. Visit the URL that check50 outputs to see the input check50 handed to your program, what output it expected, and what output your program actually gave.
How to Submit
After you submit, be sure to check your autograder results. If you see SUBMISSION ERROR: missing files (0.0/1.0)
, it means your file was not named exactly as prescribed (or you uploaded it to the wrong problem).
Correctness in submissions entails everything from reading the specification, writing code that is compliant with it, and submitting files with the correct name. If you see this error, you should resubmit right away, making sure your submission is fully compliant with the specification. The staff will not adjust your filenames for you after the fact!
In RStudio, select the eda.R
file you created for this problem, as well as any data files your program needs to run. With the file selected, click on the icon at the top of the file explorer. Choose Export, name your file
eda-solution.zip
, followed by Download.
Go to CSCI E-5a’s Gradescope page.
Click Problem Set 5: Exploratory Data Analysis.
Unzip your eda-solution.zip
file. Open the folder. Drag and drop your eda.R
file and lyrics folder to the area that says Drag & Drop. Be sure that your eda.R
file is named exactly as prescribed above, lest the autograder fail to run on your submission! Note that your submission is considered incomplete if any of the files are missing—be sure they’re all there!
Click Upload.
You should see a message that says “Problem Set 5: Exploratory Data Analysis submitted successfully!”
Be sure to double-check your autograder results before moving on!