-
Notifications
You must be signed in to change notification settings - Fork 74
Performance Tests
Authors: Hailey Pan, Zuozhi Wang
Reviewed by Chen Li
09/14/2016
We want to set up benchmarks and test the performance of each operator in TextDB. Moreover, we make sure Textdb’s codebase won’t slow down the performance, and we want to automatically do performance tests each time a pull request is merged into the master branch. To achieve these goals, we set up the textdb-perftest project, and this document describes the performance test workflow.
As of 9/25/2016: FINISHED
Code in module: edu.ics.uci.textdb.perftest
The packages dictionarymatcher
, keywordmatcher
, fuzzytokenmatcher
, regexmatcher
, and nlpextractor
contain the performance test code of each operator.
The package runme
contains the main function to start running the performance tests.
We are using the Medline dataset. You can see its description and download files here.
package medline
contains the schema of the medline dataset
Data files needs to be put in the (textdb directory)/textdb/textdb-perftest/sample-data-files folder. Please put one data file in this folder; otherwise it will affect how we display the results later.
The perftest-files/queries folder contains a file of sample queries, which is used in testing KeywordMatcher and DictionaryMatcher. The perftest-files/results folder contains the performance test results.
Write index and run performance tests
In the package runme
,
WriteIndex.java
writes index.
RunTests.java
assumes that index already exists, and runs the performance tests.
RunPerftests.java
writes index first and then runs performance tests.
Index are written into the (textdb directory)/textdb/textdb-perftest/index folder
As we mentioned earlier, we want to automate the performance test process. So we write a Python script and use cron job to run it automatically everyday. The python script build.py
will pull changes from github, then run performance test if there’s a change in the master branch.
It’s easy to run performance in an IDE (for example, Eclipse or IntelliJ). We can simply run the java file, and the IDE takes care of the rest. However, in a command line environment, it’s much harder to run the program. The command to run the program is generated in build.py
. (Attention: the command needs to be changed if TextDB's dependencies change)
The Java performance test program writes the results into the “perftest-files/results” folder. There’s one csv file for each operator to record the results of each run.
Here’s a sample format of one csv file “keyword-phrase.csv”:
Date | Record # | Min Time | Max Time | Average Time | Std | Average Results | Commit Number |
---|---|---|---|---|---|---|---|
09-09-2016 00:54:18 | abstract_100 | 0.017 | 1.373 | 0.2371 | 0.4464 | 2.18 |
Other operators’ csv files look similar to the format above.
The “Commit Number” column is empty because we choose to let the Python script fill in the commit number. So running the Java program, either via IDE or command line, won’t produce a commit number in the result file. The commit number is only added by running the Python script.
We use an open source Java package dashbuilder to display the results. DashBuilder will automatically read the results produced by the python script, and display the results. Please refer to another internal documentation for setting up dashbuilder.