Skip to content

genematx/Udacity_P1_StackOverflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

*** WORK IN PROGRESS ***

Table of Contents

  1. Description
  2. Installation
  3. Project Motivation
  4. File Descriptions
  5. Results
  6. Licensing, Authors, and Acknowledgements

Description

The repository contains the code used in Project 1 of Udacity Data Science Nanodegree.

Installation

The code is written and tested in Python 3.5. Additional libraries to be installed beyond the standard Anaconda distribution are:

The original dataset can be downloaded from the StackOverflow page. Please unpack the archives and place their contents directly to the folder ./data/developer_survey_20xx for each year (without subfolders).

Project Motivation

The primary motivation for this project is to learn and gain practical experience of working with messy data in Python. To do this, we aim to combine (fuse) StackOverflow datasets from the previous decade and try to uncover useful insights about the changes that occurred in the field of software engineering. Specifically, we look at:

  1. How did the popularity of different programming languages change over the years?
  2. What drives the increasing popularity of Python as a programming language?
  3. Are the developers happy with their job, and if not, why?

File Descriptions

The data processing is split between several Jupyter notebooks.

re are two notebooks available here to showcase work related to the above questions. Each of the notebooks is exploratory in searching through the data pertaining to the questions showcased by the notebook title. Markdown cells were used to assist in walking through the thought process for individual steps.

There is an additional .py file that runs the necessary code to obtain the final model used to predict salary.

Results

The main findings of the analysis can be found at the post available here.

Licensing, Authors, Acknowledgements

Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!

  1. Installation - Extra libraries that are not installed with the Anaconda distribution, as well as what version of python you are using should be noted.

  2. Project Motivation - Discuss what your project is about, and what interested you in pursuing the project.

  3. File Descriptions - Guide others through the files in your repository. You may not talk about every file here, but you should let them know where they can find the work they might find most interesting.

  4. How To Interact With Your Project - When your project isn't meant to be interactive or used for other projects, you should instead talk about the technical details of your project. What were your results? What did you do to improve them? What methods did you try? What worked? What didn't work?

  5. Licensing, Authors, Acknowledgements - You always want to give credit where necessary. Acknowledge other contributors, helpful peers, data providers, etc.

About

Project 1 of Udacity DS Nanodegree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published