Skip to content

1. Setup

philllies edited this page Jun 5, 2020 · 14 revisions

Installation

  • Python 3.4 on Linux, Mac or Windows
  • Numpy, SciPy and Pandas (only required when using the data/workflow generator)
  • MonetDB (https://www.monetdb.org/)
  • Python Client API for MonetDB: pip install pymonetdb

Step 0: Unzip the default dataset

Goto data unzip flights.zip so that you get the following path: data/flights

Step 1: Generate a Dataset

To generate the default domestic flight dataset, run the following command:

python datagen.py --sample-file data/flights/sample.csv --sample-descriptor data/flights/sample.json --size 500000

This will write 500K rows to a CSV file named dataset.csv.

Step 2: Generate the Workload

To generate a workflow of a certain type and for a specific dataset, use the following command:

python workflowgen.py --dataset flights --workflow-type independent.json --output myworkflow

This will create a file named data/flights/workflows/myworkflow.json that contains 20 interactions (by default). The Workflow Generator uses a random seed, which can be manually set by using the --seed option (e.g., --seed 42). This can be used, for instance, to create multiple workflows of the same type. For other workflow types, see data/flights/workflowtypes.

Alternatively, if you want to use IDEBench's default workflows, copy and paste the workflows located in data/flights/default_workflows to data/flights/workflows.

Step 3: Test the Sample Driver

IDEBench comes with a sample driver. To test whether your environment is set up correctly, run the following command from IDEBench's root directory.

python idebench.py --run-config runconfig_sample.json

This will run the test workflow (see data/flights/workflows) using the sample driver (see drivers/sample.py). Upon completion you will find a detailed report in CSV format in the reports folder. Note that the sample driver is merely a stub implementation of an IDEBench driver. It does not actually execute any queries.

Step 4: Setup MonetDB

We recommend setting up MonetDB on your system as it serves as a baseline to compare to other systems, and can be easily used to compute the ground-truth for generated workflows.

  1. Make sure MonetDB is installed on your system as well as the pymonetdb package for Python.
  2. Create a new database and import the data generated in Step 1 to your database with the commands below (for more information see MonetDB Docs)
CREATE TABLE tbl_flights (YEAR_DATE int,UNIQUE_CARRIER char(100),ORIGIN char(100),ORIGIN_STATE_ABR char(2),DEST char(100),DEST_STATE_ABR char(2),DEP_DELAY double,TAXI_OUT double,TAXI_IN double,ARR_DELAY double,AIR_TIME double,DISTANCE double);
COPY OFFSET 2 INTO tbl_flights FROM '{ABSOLUTE_PATH_TO_DATASET.CSV}' DELIMITERS ',','\n','"';
  1. Open drivers/monetdb.py and make sure the host, port and credentials to connect to your local MonetDB installation are set correctly.

Step 5: Compute Ground-truths

With MonetDB set up you can now compute the ground-truth for all workflows inside of a dataset. For instance, to compute the groundtruth for all workflows for the flight dataset run the following command:

python idebench.py --settings-dataset flights --settings-size 500K --driver-name monetdb --groundtruth

This will place the ground-truths for all workflows in the data/flights/groundtruths folder.