Skip to content
View mmingalov's full-sized avatar
🥋
🥋
  • Pattaya

Block or report mmingalov

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mmingalov/README.md

Hi there 👋

My name is Max. More than 11 years I worked in mining&geology domain where I developed business in Far East and Siberian regions of Russia, implemented data management systems (more details in my article for Globus magazine -- on page 152). Now I came back to fintech and work as big data engineer. I continuously improve my skills in DA and DE directions.

🌱 ...I've passed Big Data analytics path and Data Engineering path...

🛠️ Languages and Tools I use:

PostgresMicrosoftSQLServerMySQLApache Airflow
Apache SparkApache HadoopApache Hive
PythonNumPyPandasPlotly
Microsoft ExcelTableau

SUMMARY:

In this profile you may see key projects and tasks have been resolved by me during my career and education paths. I've grouped them in next list.

Python

  1. Small ETL windows app for converting Excel protocol file to CSV file format with required structure. Stack: pandas, re, tkinter https://github.com/mmingalov/micromine-lab-protocols

  2. On start Spider gets 2 links on VK.com user accounts. Algorithm task is: to find the shortest chain of handshakes by composing it from mutual friends. Task was resolved 2 ways: a) Scrapy Spider; b) Recursion Stack: scrapy, mongo db https://github.com/mmingalov/geekbrains-methods-data-collection-from-internet/tree/master/course_work

  3. Apache Airflow DAG finds in Rick and Morty API three locations with maximum number of residents and writes results into Greenplum DB table. Stack: greenplum db, airflow https://github.com/mmingalov/kc-airflow/blob/main/dags/m-mingalov/m-mingalov_5_Rick_and_Morty.py

Data Analysis

  1. In this competition your task will be to predict the mean math exam result (from 0 to 100 points) for students of tutors in test.csv. Metric – determination coefficient. Few solutions were provided in Jupyter notebook file. https://github.com/mmingalov/geekbrains-data-analysis-alg/tree/master/tutors-expected-math-exam-results

  2. Model for real estate prices prediction (houses) Price variable is target. Output predictions file includes two columns – Id and Price. Few solutions were provided in Jupyter notebook files. https://github.com/mmingalov/geekbrains-python-data-science/tree/master/course_project

Tableau

This dashboard I created for tracking a result of my investment deals. https://public.tableau.com/app/profile/maxim.mingalov/viz/IISv3/Dashboard1?publish=yes

BIG DATA

MapReduce

MapReduce task with using python https://github.com/mmingalov/kc-hadoop/tree/master/homework_lesson5

Hive

Some practice with creating partitioned tables and views for taxi dataset https://github.com/mmingalov/kc-hadoop/tree/master/homework_lesson7

Spark

  1. Project from my learning in ‘Big Data Analytics’ faculty. Please find detailed description in Powerpoint files (Russian and English versions) and steps of execution in 'final project executing.docx' file. https://github.com/mmingalov/geekbrains-final-project

  2. This model rates clients and provides decisions about should we credit them or should not. https://github.com/mmingalov/kc-big-ML/tree/main/4_1_Bank_credit_scoring

  3. Model for predicting the value of the maximum loan amount based on client data. https://github.com/mmingalov/kc-big-ML/tree/main/4_2_Bank_credit_rate

  4. Pyspark profiler for getting additional statistics of table columns. https://github.com/mmingalov/spark-profiler/

Popular repositories Loading

  1. VBM VBM Public

    Developing and implementation financial datawarehouse using Microsoft SQL Server technology. Creating a set of business views for building analytical reports including: - Monitoring, forecasting an…

    TSQL 1 1

  2. geekbrains-spark-streaming geekbrains-spark-streaming Public

    Structured streaing

    Python 1 1

  3. hello-world hello-world Public

    HTML

  4. geekbrains-db geekbrains-db Public

    repository for GeekBrains course "Databases"

    TSQL

  5. geekbrains-math geekbrains-math Public

    Введение в высшую математику

    Python

  6. geekbrains-python-data-science geekbrains-python-data-science Public

    Библиотеки Python для Data Science: Numpy, Matplotlib, Scikit-learn

    Jupyter Notebook 3