GitHub - raj-chak/dask_parallel_compute: Showcases how the dask distributed actor architecture can be leveraged for a high performance parallel computing cluster

raj-chak / dask_parallel_compute Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Showcases how the dask distributed actor architecture can be leveraged for a high performance parallel computing cluster

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
numpy		numpy
Id_name_mapping_for_date_ranges.csv		Id_name_mapping_for_date_ranges.csv
Id_name_mapping_for_stocks.csv		Id_name_mapping_for_stocks.csv
README.txt		README.txt
createNumpys.py		createNumpys.py
daskClientAssetCorr.py		daskClientAssetCorr.py
stocks_closing_price_sorted.csv		stocks_closing_price_sorted.csv

Repository files navigation

dependencies on both scheduler/worker:
python3-pip
dask[complete]

dependency on scheduler/driver node:
flask

//Run dask scheduler on scheduler node:
dask-scheduler &

//Run dask workers on worker node:
dask-worker scheduler_node_ip:8786 --name 0 &
dask-worker scheduler_node_ip::8786 --name 1 &


//Start flask based application on scheduler node ( also the driver node in this case):
//Place numpy array files (0.npy and 1.npy) on worker node(s) as specified by folder name in code below
// file: daskClientAssetCorr.py
// line 14: nmp_dir = '/root/dask/numpy/'
nohup python3 daskClientAssetCorr.py &

//Test flask endpoint using curl from driver node:
//d is an array of date_id ranges
//s is input symbol_id
time curl -o a.out -d '{"d":[[10,100],[463,1057]],"s":16}'  -H 'Content-Type: application/json' -X POST http://127.0.0.1:5000/api/corr


//To create the numpy arrays from flat files use the file createNumpys.py:
stocks_per_core=628
stock_count=1255
date_count=1258
core_count=2
//Make sure above four variables are correct. Note that stocks_per_core = ceil(stock_count/core_count)


Please note that the numpy array is the partition for parallelism. 
There are as many numpy arrays as there are dask-workers.
Note that the dask-workers are named 0,1 etc. and operate on 0.npy and 1.npy respectively in this case.