Exploratory Data Analysis
Problem to Solve
Exploratory data analysis (EDA) is a method used by data scientists to find interesting characteristics of data and test hypotheses. It’s often one of the initial steps a researcher might take in a larger data analysis program.
You now have many of the tools you need to perform your own exploratory data analysis! In a program called eda.R
, in a folder called eda
, write a program to visualize and explore a data set of your choice.
Getting Started
For this problem, you’ll need to create eda.R
in a folder called eda
.
Create eda.R
Open RStudio per the linked steps and navigate to the R console:
>
Next execute
getwd()
to print your working directory. Ensure your current working directory is where you’d like to create this problem’s folder. If using RStudio through cs50.dev the recommended directory is /workspaces/NUMBER
where NUMBER
is a number unique to your codespace.
If you do not see the right working directory, use setwd
to change it! Try typing setwd("..")
if in the working directory of another problem, which will move you one directory higher.
Next execute
dir.create("eda")
in order to create a folder called eda
in your codespace.
Now type
setwd("eda")
followed by Enter to move yourself into (i.e., open) that directory. Your working directory should now end with
eda/
Finally, type
file.create("eda.R")
to create a file called eda.R
inside of the eda
folder.
If all was successful, you should execute
list.files()
and see eda.R
. If not, retrace your steps and see if you can determine where you went wrong!
Specification
The only requirement for this problem is that you produce a visualization you care about, that is interesting to you, and that you feel proud of. Oh, and that you save the visualization in a file called visualization.png
!
You might find it helpful to get a bit of inspiration:
Usage
Assuming eda.R
is in your working directory, enter the below in the R console to test your program:
source("eda.R")
How to Test
Afraid there isn’t a specific way to test your code, other than by trial and error to produce the visualization you’re looking for!
check50
You can also check your code using check50
, a program that CS50 will use to test your code when you submit. But be sure to test it yourself as well!
Run the following command in the RStudio console:
check50("cs50/problems/2024/r/eda")
Green smilies mean your program has passed a test! Red frownies will indicate your program output something unexpected. Visit the URL that check50 outputs to see the input check50 handed to your program, what output it expected, and what output your program actually gave.
How to Submit
You can submit your code using submit50
.
Keeping in mind the course’s policy on academic honesty, run the following command in the RStudio console:
submit50("cs50/problems/2024/r/eda")