Skip to content
This repository has been archived by the owner on Apr 7, 2022. It is now read-only.

ST-314-nilsstreedain/DA1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Analysis 1 - Introduction to RStudio

This first data analysis assignment is intended to help you familiarize yourself with the RStudio interface and get comfortable running small chunks of R code. If you have not done so already, please work through the R Tutorial in the Start Here module on Canvas.

For this assignment, you will need the Intro_to_RStudio.R file and the loan50.csv dataset. Both can be found on the Data Analysis 1 assignment page.

You can find a description of the variables recorded in the loan50.csv dataset on the OpenIntro Statistics loan50 info page.

The Intro_to_RStudio.R script walks you through using some of the basic, built-in functions in R. Read through and run each line of code to ensure you understand what the functions are doing and what types of output each produces. The script uses two variables: loan_amount and homeownership. For the assignment you’ll submit, you will practice using two different variables. Please make sure the assignment you submit uses the correct variables (specified in the questions below).

Part 1 - Exploring a Single Quantitative Variable

For this portion of the assignment, you’ll practice using R to explore the annual_income variable in the loan50.csv data set.

  1. (2 points) Construct a histogram of the annual income data. Include informative labels and a title. Include your histogram below. To copy or save a graph from RStudio, click the Export button just above the preview of the graph. From there you can choose to Save Image or Copy to Clipboard.
  2. (2 points) Construct a boxplot of the annual income data. Include informative labels and a title. Include your boxplot below.
  3. (2 points) Using the histogram you constructed in part a and the boxplot from part b, describe the shape of the distribution of the annual income variable and comment on the presence of any outliers.
  4. (1.25 points) Calculate the mean of the annual income data.
  5. (1.25 points) Calculate the median of the annual income data.
  6. (2 points) Which measure of center (mean or median) is more appropriate for these data? Why? Consider the shape of the distribution discussed in part c.
  7. (1.25 points) Calculate the standard deviation of the annual income data.
  8. (1.25 points) Calculate the interquartile range of the annual income data.

Part 2 – Visualizing Two Variables

Let’s continue to explore the annual income data, but now consider how annual income data may vary between loan status (current or fully paid).

  1. (2 points) Construct a side-by-side boxplot for annual income broken up by loan status. Include informative labels and a title.
  2. (2 points) How do the distributions of annual income compare for loan status? Comment on the shape, center, spread, and presence of outliers for the two groups.

Part 3 – Exploring a Single Categorical Variable

Finally, we’ll focus our attention only on the loan status variable.

  1. (2 points) Construct a table of counts for the loan status variable. Report the number of observations in each category below.
  2. (2 points) Construct a table of proportions for the loan status variable. Report the proportions for each category below.
  3. (2 points) Construct a barplot that displays the distribution of loan status types. Include informative labels and a title. Include your barplot below.

Gradescope Page Matching (2 points)

When you upload your PDF file to Gradescope, you will need to match each question on this assignment to the correct pages. Video instructions for doing this are available in the Start Here module on Canvas on the page “Submitting Assignments in Gradescope”. Failure to follow these instructions will result in a 2-point deduction on your assignment grade. Match this page to outline item “Gradescope Page Matching”.