Virtual-Diagnostic-Lab · jadhav-kunal · Oct 5, 2021
diff --git a/Detecting_parkinsons_disease.ipynb b/Detecting_parkinsons_disease.ipynb
@@ -0,0 +1 @@
+{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"code","source":"#  install necessary packages ( install first time only )\n# !pip install numpy pandas sklearn xgboost --upgrade","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load\n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n\n# Input data files are available in the read-only \"../input/\" directory\n# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory\n\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n    for filename in filenames:\n        print(os.path.join(dirname, filename))\n\n# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"* Install Necessary packages here","metadata":{}},{"cell_type":"code","source":"# os packages\nimport os, sys","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Data Collection","metadata":{}},{"cell_type":"code","source":"#  let’s read the data into a DataFrame \n\ndf = pd.read_csv('/kaggle/input/parkinsons.data')\ndf.tail() # shows the last 5 rows\n\n# head() <= Use for first 5 rows","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# descrive the data\n\ndf.describe()","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#  To know how many rows and cols and NA values\n\ndf.info()","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"- we can see here there are 135 records and 24 columns available in this dataset","metadata":{}},{"cell_type":"code","source":"#  shape of the dataset \n\ndf.shape","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Feature Enginiearing\n","metadata":{}},{"cell_type":"code","source":"#  get the all features except \"status\"\n\nfeatures = df.loc[:, df.columns != 'status'].values[:, 1:] # values use for array format\n\n\n\n# get status values in array format\n\nlabels = df.loc[:, 'status'].values\n\n","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# to know how many values for 1 and how many for 0 labeled status\n\ndf['status'].value_counts()","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"\n#  import MinMaxScaler class from sklearn.preprocessing\n\nfrom sklearn.preprocessing import MinMaxScaler","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"\n#  Initialize MinMax Scaler classs for -1 to 1\n\nscaler = MinMaxScaler((-1, 1))\n\n# fit_transform() method fits to the data and\n# then transforms it.\n\nX = scaler.fit_transform(features)\ny = labels\n\n#  Show X and y  here\n# print(X, y)","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#  import train_test_split from sklearn. \n\nfrom sklearn.model_selection import train_test_split","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# split the dataset into training and testing sets with 20% of testings\n\nx_train, x_test, y_train, y_test=train_test_split(X, y, test_size=0.15)\n","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Model Training\n","metadata":{}},{"cell_type":"code","source":"# Load an XGBClassifier and train the model\n\nfrom xgboost import XGBClassifier\nfrom sklearn.metrics import accuracy_score","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"\n\n\n* To Know more about **[\"Xtreme Gradient Boosting Algorithm\"](https://data-flair.training/blogs/gradient-boosting-algorithm/)**\n","metadata":{}},{"cell_type":"code","source":"# make a instance and fitting the model\n\nmodel = XGBClassifier()\nmodel.fit(x_train, y_train) # fit with x and y train\n","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Model Prediction\n","metadata":{}},{"cell_type":"code","source":"#  Finnaly pridict the model\n\ny_prediction = model.predict(x_test)\n\nprint(\"Accuracy Score is\", accuracy_score(y_test, y_prediction) * 100)","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Summary","metadata":{}},{"cell_type":"markdown","source":"<p>\nIn this Python machine learning project, we learned to detect the presence of Parkinson’s Disease in individuals using various factors. We used an XGBClassifier for this and made use of the sklearn library to prepare the dataset. This gives us an accuracy of <b> 96.66%</b>, which is great considering the number of lines of code in this python project.\n</p>","metadata":{}}]}