This project works through data analysis concepts and techniques from a course on www.suanlab.com. The materials cover core data analysis skills that apply to any field or industry.
- Exploratory Data Analysis - Initial investigation into datasets. Counting values, finding distributions, spotting outliers, etc.
- Data Preprocessing - Formatting, cleaning and restructuring data before analysis. 3.Data Cleaning - Identifying and fixing issues like missing values, duplicates, inconsistent formatting, etc.
- Data Integration - Combining two or more datasets together. Requires ensuring the data can be merged, handling overlap, etc.
- Data Reduction - Using summarization, aggregation, dimension reduction, etc. to make datasets smaller and more manageable.
- Data Transformation - Changing the form, structure or scale of data to suit your needs. Log transforms, scaling, binning, etc.
- Feature Engineering - Creating new features from raw data to use in models.
- OpenRefine - Tool for data cleaning and transformation.
- NumPy - Library for scientific computing in Python. Used for statistics, matrices, and more.
- Pandas - Library for data analysis and manipulation in Python. Built on NumPy.
- Exploring and Visualization of Titanic Dataset - Putting the skills into action on the Titanic Kaggle dataset.
The course uses: Colab Python NumPy Pandas Matplotlib, Seaborn OpenRefine
The projects analyze and visualize various public datasets to demonstrate concepts and techniques.
I found this course very helpful for building a basic foundation in qualitative and quantitative data analysis using Python. Please let me know if you have any feedback or suggestions for improving my data analysis skills!
Please visit the original site (www.suanlab.com) for more in-depth tutorials and resources.