Skip to content

gtolomei/python-for-datascience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 

Repository files navigation

Python Programming for Data Science

General Info

Welcome to Python Programming for Data Science!

This is a first-year course of the MSc in Data Science of the University of Padova. Indeed, it is one of the three modules which the course "Fundamentals of Information Systems" is made of.

This repository contains lecture materials (in the form of Jupyter Notebook and PDF slides) as well as exercises from the 2018-19 examination sessions (with solutions).

Course Goal

The goal of this module is to teach the basics of the Python programming language along with a special focus on Data Science. In particular, students will become familiar with Python packages that are widely used by the community of data scientists and machine learning practicioners, such as numpy, scipy, pandas, seaborn, and scikit-learn, just to name a few.
Eventually, at the end of this module students are expected to be able to implement all the stages of a typical machine learning pipeline: from collecting data to building predictive models for solving either a regression or a classification problem.
A full detailed description of the course is available here.

Course Syllabus

Python Programming for Data Science provides students with the foundational coding skills they need as data scientists.

We start our journey with an exhaustive tutorial on how to properly set up your environment, which is used throughout the class. Essentially, this consists of:

  • Installing Python 3.x (we will be using Python 3.6 installed via Anaconda in this class)
  • Installing and setting up Jupyter Notebook

Then, we move to discussing the basics of the Python programming language:

  • Python object model
  • built-in data types
  • fuctions
  • I/O

Finally, we will dig into a set of the most up-to-date data science Python packages, such as:

  • numpy/scipy (for numerical/scientific computing)
  • pandas (for data manipulation)
  • matplotlib/seaborn (for data visualization)
  • scikit-learn (for machine learning tasks like regression and classification).

Class Schedule

Lecture # Topics Class Material
Lecture 0 Preliminary computer science concepts Notebook, Slides
Lecture 1 Introduction and environment setup Notebook, Slides
Lecture 2 Python basics Notebook, Slides
Lecture 3 Python's built-in data types (Part I) Notebook, Slides
Lecture 4 Python's built-in data types (Part II) Notebook, Slides
Lecture 5 Functions & I/O Notebook, Slides
Lecture 6 numpy package Notebook, Slides
Lecture 6b Review of linear algebra basics Notebook, Slides
Lecture 7 Introduction to pandas package Notebook, Slides
Lecture 8 I/O with pandas Notebook, Slides
Lecture 9 Data preparation with pandas Notebook, Slides
Lecture 10 Data visualization with matplotlib Notebook, Slides
Lecture 11 A Machine Learning Primer (seminar) Notebook, Slides
Lecture 12 The Regression Problem: Example (Part I) Notebook
Lecture 13 The Regression Problem: Example (Part II) Notebook
Lecture 14 The Classification Problem: Example (Part I) Notebook
Lecture 15 The Classification Problem: Example (Part II) Notebook
Lecture 16 Logistic Regression Demystified (seminar) Slides

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published