This project is part of Udacity Data Analyst Nanodegree program.
In this project, you will use R and apply exploratory data analysis techniques to explore relationships in one variable to multiple variables and to explore a selected data set for distributions, outliers, and anomalies.
In order to complete the project, you will need to install R. After installing R, you will need to download and install R Studio. Finally, you will need to install a few packages. We recommend opening R Studio and installing the following packages using the command line.
- install.packages("ggplot2", dependencies = T)
- install.packages("knitr", dependencies = T)
- install.packages("dplyr", dependencies = T)
I chose the White Wine Quality Data Set.
- A stream-of-consciousness analysis and exploration of the data.
a. Headings and text should organize your thoughts and reflect your analysis as you explored the data.
b. Plots in this analysis do not need to be polished with labels, units, and titles; these plots are exploratory (quick and dirty). They should, however, be of the appropriate type and effectively convey the information you glean from them.
c. You can iterate on a plot in the same R chunk, but you don’t need to show every plot iteration in your analysis.
- A section at the end called “Final Plots and Summary”
You will select three plots from your analysis to polish and share in this section. The three plots should show different trends and should be polished with appropriate labels, units, and titles (see the Project Rubric for more information).
- A final section called “Reflection”
This should contain a few sentences about your struggles, successes, and ideas for future exploration on the data set (see the Project Rubric for more information).
- The RMD file containing the analysis (final plots and summary, and reflection)
- the HTML file knitted from the RMD file using the knitr package