Files

All the file necessary to use this project are available on OneDrive

How to get started

In order to set up this repository on your computer, you must do the following:

Clone the repository to your computer
Download the prepared volumes from OneDrive, and extract the zip file
From the root of the project, run import/import.cmd
- Use the location of the extracted volumes as input, such as C:/Users/Name/Desktop/Volumes
Start the cluster using start.cmd
- For stopping the cluster we recommend using stop.cmd to prevent corrupting data in HBase
Wait for HDFS to exit safemode and HBase to initialize. This might take a few minutes
You can now view different visualizations on localhost

Executing jobs in the cluster

Go to the admin panel
Ensure that all files necessary for the job are uploaded under "Upload" with the type "Spark application"
- If using the volumes provided on OneDrive, this has already been done
- If running jobs from scratch, make sure to upload the drivers for all python jobs as py files to HDFS. Furthermore, the code for all python files must be uploaded to the same directory as a single zip archive named files. Jar libraries on which the Spark applications depend must also be uploaded - these jars are available on OneDrive.
Under "Submit Spark application", write the name of the job to execute, such as incident_aggregator and press "Submit"
The status of the job is most easily tracked on Livy or by using the "Spark job status" on the admin page

Building images locally

For your convenience, all images have been uploaded to DockerHub.

If, for some reason, you wish to build images yourself, you must download the SHC connector from OneDrive and place it under pysparkApp/. This file is too large for GitHub, and our public fork does not permit the use of LFS.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.vscode		.vscode
flume		flume
frontend		frontend
hbase		hbase
importer		importer
livy		livy
mysql		mysql
nodemanager		nodemanager
pysparkApp		pysparkApp
spark		spark
.gitignore		.gitignore
README.md		README.md
addroute.cmd		addroute.cmd
docker-compose.yml		docker-compose.yml
flume.env		flume.env
hadoop.env		hadoop.env
hbase.env		hbase.env
secrets.env		secrets.env
start.cmd		start.cmd
start.sh		start.sh
stop.cmd		stop.cmd
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Files

How to get started

Executing jobs in the cluster

Building images locally

About

Releases

Packages

Languages

Xitric/DataScienceDocker

Folders and files

Latest commit

History

Repository files navigation

Files

How to get started

Executing jobs in the cluster

Building images locally

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages