-
Notifications
You must be signed in to change notification settings - Fork 11
1. Setup
- Python 3.4 on Linux, Mac or Windows
- Numpy, SciPy and Pandas (only required when using the data/workflow generator)
- MonetDB (https://www.monetdb.org/)
- Python Client API for MonetDB:
pip install pymonetdb
Goto data
unzip flights.zip
so that you get the following path: data/flights
To generate the default domestic flight dataset, run the following command:
python datagen.py --sample-file data/flights/sample.csv --sample-descriptor data/flights/sample.json --size 500000
This will write 500K rows to a CSV file named dataset.csv
.
To generate a workflow of a certain type and for a specific dataset, use the following command:
python workflowgen.py --dataset flights --workflow-type independent.json --output myworkflow
This will create a file named data/flights/workflows/myworkflow.json
that contains 20 interactions (by default). The Workflow Generator uses a random seed, which can be manually set by using the --seed
option (e.g., --seed 42
). This can be used, for instance, to create multiple workflows of the same type.
For other workflow types, see data/flights/workflowtypes
.
Alternatively, if you want to use IDEBench's default workflows, copy and paste the workflows located in data/flights/default_workflows
to data/flights/workflows
.
IDEBench comes with a sample driver. To test whether your environment is set up correctly, run the following command from IDEBench's root directory.
python idebench.py --run-config runconfig_sample.json
This will run the test
workflow (see data/flights/workflows
) using the sample driver (see drivers/sample.py
).
Upon completion you will find a detailed report in CSV format in the reports
folder.
Note that the sample driver is merely a stub implementation of an IDEBench driver. It does not actually execute any queries.
We recommend setting up MonetDB on your system as it serves as a baseline to compare to other systems, and can be easily used to compute the ground-truth for generated workflows.
- Make sure MonetDB is installed on your system as well as the pymonetdb package for Python.
- Create a new database and import the data generated in Step 1 to your database with the commands below (for more information see MonetDB Docs)
CREATE TABLE tbl_flights (YEAR_DATE int,UNIQUE_CARRIER char(100),ORIGIN char(100),ORIGIN_STATE_ABR char(2),DEST char(100),DEST_STATE_ABR char(2),DEP_DELAY double,TAXI_OUT double,TAXI_IN double,ARR_DELAY double,AIR_TIME double,DISTANCE double);
COPY OFFSET 2 INTO tbl_flights FROM '{ABSOLUTE_PATH_TO_DATASET.CSV}' DELIMITERS ',','\n','"';
- Open
drivers/monetdb.py
and make sure the host, port and credentials to connect to your local MonetDB installation are set correctly.
With MonetDB set up you can now compute the ground-truth for all workflows inside of a dataset. For instance, to compute the groundtruth for all workflows for the flight dataset run the following command:
python idebench.py --settings-dataset flights --settings-size 500K --driver-name monetdb --groundtruth
This will place the ground-truths for all workflows in the data/flights/groundtruths
folder.