Skip to content

greatsharma/MPG

Repository files navigation

MPG

Part 1, Exploratory Data Analysis(EDA):
This part consists of summary statistics of data but the major focus will be on EDA where I extract meaning/information from data using plots and report important insights about data. This part is more about data analysis and business intelligence(BI). You can follow this entire notebook on kaggle as well.

Part 2, Statistical Analysis:
In this part I will do many statistical hypothesis testing, apply estimation statistics and interpret the results. I will also validate this with the findings from part one. I will apply both parametric and non-parametric tests. This part is all about data science requires statistical background. You can follow this entire notebook on kaggle as well.

Part 3, Predictive Modelling:
In this part I will predict mpg using predictors. This part is all about machine learning. I used many data pipelines and models for training and then predict using the best found pipeline and model.

If you like these notebooks then please share with others.

Data Description

The data we are using is the auto mpg dataset taken from UCI repository.

Information regarding data
    Title: Auto-Mpg Data
    Number of Instances: 398
    Number of Attributes: 9 including the class attribute
    Attribute Information:

    1. mpg:           continuous
    2. cylinders:     multi-valued discrete
    3. displacement:  continuous
    4. horsepower:    continuous
    5. weight:        continuous
    6. acceleration:  continuous
    7. model year:    multi-valued discrete
    8. origin:        multi-valued discrete
    9. car name:      string (unique for each instance)
    
    All the attributes are self-explanatory.

This data is not complex and is good for analysis as it has a nice blend of both categorical and numerical attributes.

data source

If you like this project then please star it and also share with others.