This repository is dedicated to advanced data analysis and visualization of the COVID-19 pandemic, utilizing a synthetic dataset that reflects realistic trends in cases, deaths, and vaccination rates across five countries. The repository contains scripts in multiple programming languages, including R, Python, Julia, SAS, and Stata, each designed to explore different aspects of the COVID-19 data.
The dataset, synthetic_covid_data_realistic.csv
, comprises simulated daily data on COVID-19 cases, deaths, recovered cases, and vaccination rates for the United States, Brazil, India, Russia, and the United Kingdom from January 1, 2020, to December 31, 2021. The dataset was generated to mimic real-world dynamics and provide a basis for comprehensive data analysis and visualization exercises across various programming languages.
The covid19DataGetter.py
Python script is responsible for getting the synthetic COVID-19 dataset from public health sources.
-
CovidTrendAnalysis.R: This script performs a time-series analysis, plotting the trends of COVID-19 cases and deaths over time for each country.
-
VaccinationImpactAnalysis.R: This script investigates the impact of vaccination rates on COVID-19 cases and deaths, employing various statistical methods to infer correlations.
-
CovidTrendAnalysisPython.py: Analyzes COVID-19 case and death trends using Pandas for data manipulation and Matplotlib/Seaborn for plotting.
-
VaccinationImpactAnalysisPython.py: Explores the relationship between vaccination rates and case/death rates per capita.
CovidVisualizationJulia.jl: Utilizes the Plots.jl package to create comparative plots of cumulative cases over time and a heatmap of vaccination rates.
CovidTrendAnalysisSAS.sas: Generates time series plots for COVID-19 deaths and vaccination rates, highlighting the capabilities of SAS for data visualization.
CovidTrendAnalysisStata.do: Focuses on visualizing COVID-19 trends in cases and deaths over time, demonstrating Stata's graphing functionalities.
VaccinationImpactAnalysisStata.do: Analyzes the effect of vaccination on controlling the pandemic through scatter plots.
To use the scripts in this repository:
- Download the
synthetic_covid_data_realistic.csv
dataset from this GitHub repository. - Choose the script corresponding to the programming language and analysis you are interested in.
- Ensure you have the necessary environment and libraries/packages installed for the selected programming language.
- Run the script in your preferred development environment or command line interface.
Contributions to enhance the analysis, add new scripts, or improve data visualization are welcome. Please feel free to fork the repository, make your changes, and submit a pull request.
This project is open-source and available under the MIT License.
This project is for educational purposes only. All analyses and conclusions should be considered illustrative.
Also, note that the dataset that we are getting is changing as it is being updated, so keep in mind that our plots may change as well.
Created with ❤️ by Son Nguyen in 2024. Thank you for visiting!