Benchmarks

#Machine characteristics 132 G RAM, 64 core machine, 5 TB SSD drive.

#Software settings

DB2 configuration (Version: DB2 10.5.1)

Intraparallelism was turned on (INTRA_PARALLEL = Yes in DBM CFG)
Degree of parallelism set to ANY (DFT_DEGREE = ANY in DB CFG)
Default query optimization class set to 7 (DFT_QUERYOPT = 7 in DB CFG)
Number of secondary log files set to 200 (LOGSECOND=200 in DB CFG)
Log file size set to 20,000 (LOGFILSIZ=20000 in DB CFG).

Known problems and workarounds:

DB2 has a self tuning feature to automatically adjust its resource usage based on the machine characteristics and workloads. This works well, but the system needs several runs with a query workload before it performs well.
DB2 has a known problem with statistics for URIs that are longer than 32 characters. URIs are long, and the statistics used by the optimizer get very confused when the URIs are truncated to 32 characters. We have typically reversed URIs (and their subsequent queries) to work around this problem.

##PostgreSQL configuration (Version 9.3.5) Edited the following settings in postgresql.conf:

effective_cache_size = 96GB
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1 # same scale as above

##Spark configuration (Version 1.2.1) Note: the code in Quetzal still supports 1.1.1 and not 1.2.1 -- code will be updated once docker support can be provided Edited the following in SPARK_HOME/conf/spark-env.sh

export SPARK_EXECUTOR_MEMORY=50G
export SPARK_WORKER_MEMORY=50G

** Note all tables are declared as Hive tables using the plain text file format. The fixed costs for querying we observe here might be high due to that fact. It may be the case that a Parquet or ORCFile compression will be more helpful in reducing fixed costs. It is also unclear if hive settings such as predicate pushdown can help. The numbers here provide a useful baseline measurement.

#LUBM 100M LUBM is a benchmark designed to test OWL reasoners, and tests whether systems can perform OWL-DL reasoning. Because the Quetzal libraries support only the OWL-QL profile, we expanded the LUBM queries using the ontology. Queries can be found here. The table below shows query performance in milliseconds averaged across 10 runs of the query (the first run is discarded, mean times are computed for the remaining 9 runs). Both systems (DB2 and PostgreSQL were given the opportunity to 'self tune' by warm up runs prior to this measurement. Standard deviations are in parentheses.

Query	DB2	PostgreSQL	Spark
Q1	32.89 (0.63)	41.67 (0.89)	29481.3 (105.89)
Q2	5612.44 (475.15)	113917 (36.26)	41324 (172.67)
Q3	23 (0)	24.56 (0.19)	17840 (305.67)
Q4	1152.78 (17.93)	179.11 (4.37)	154509 (1828.59)
Q5	63.11 (1.04)	62.67 (1.22)	33745.1 (247.04)
Q6	20447.9 (10.96)	41203.3 (183.11)	100533 (1568.04)
Q7	32.11 (0.37)	38.33 (0.44)	31371.4 (426.82)
Q8	337.44 (0.52)	3628.44 (4.48)	177521 (2236.41)
Q9	6721.56 (56.81)	156218 (148.59)	60044.9 (491.96)
Q10	23.67 (0.22)	24.89 (0.30)	18502.8 (327.26)
Q13	77.56 (1.52)	71.56 (2.19)	45479.8 (1170.59)
Q14	8300 (103)	26107.2 (617.07)	45727.6 (178.52)

Queries 2 and 9 in PostgreSQL do not seem to use the indexes present. We are not experts in PostgreSQL tuning, but we did try very hard to have the query optimizer pick up the indexes, unsuccessfully. If there are any PostgreSQL experts out there who see an obvious problem with these queries please let us know.

#SP2B 100M SP2B is a benchmark about DBLP authors, articles and journals. We used the benchmark as is, but Q4 requires a cross product which simply does not work for any system, as we describe in SIGMOD 2013. Timeout was set in Postgresql on the connection to 20 minutes. SP2B is a complex benchmark requiring the manipulation of very large temp tables. Again, Postgresql's performance is poor on these queries but that may be a less optimal configuration for Postgresql. Any suggestions on how to improve its performance are appreciated. For Queries 2, 6 and 11 on Spark, there were errors in handling the SQL, or exceptions thrown in Spark code in query processing.

Query	DB2	PostgreSQL	Spark
Q1	37.67 (0.56)	52.22 (1.74)	34143.7 (3528.56)
Q2	140469 (126.30)	462350 (1464.7)	-
Q3a	2277.11 (50.30)	66134.2 (66.59)	34636.4 (640.48)
Q3b	381 (7.67)	62195.2 (113.93)	17890.7 (974.89)
Q3c	4471.56 (27.81)	61915.2 (30.41)	20199.6 (313.82)
Q5a	46961.2 (75.74)	Timeout	191505 (2264.67)
Q5b	23263.7 (44.89)	713639 (44.48)	168310 (1425.78)
Q6	688255 (9324.52)	1.095e+06 (214.48)	-
Q7	71609.2 (952.41)	Timeout	793350 (596.56)
Q8	431.67 (4.11)	9174.89 (9.30)	188215 (5209.15)
Q9	42361.3 (190.22)	Timeout	578510 (1438.78)
Q10	31 (0.67)	11094.2 (14.07)	133495 (4749.04)
Q11	7381 (4)	17418.4 (177.52)	-
Q12a	74.22 (0.41)	14854.8 (83.59)	220484 (4512.15)
Q12b	120.33 (1.78)	187.44 (2.15)	175138 (4946.52)
Q12c	16.55 (0.19)	25.89 (0.96)	21438.7 (2455.56)

#DBpedia. We used the DBpedia SPARQL benchmark with the DBpedia 3.7 dataset. Because the original queries are really templates that were applied to the dataset, we have included the queries used in our benchmark here. Note that the implementation of type casts was done differently from the SIGMOD 2013 paper -- here we use XML typecasting functions packaged in DB2 and PostgreSQL for greater generality across different backends. There is worse performance on DB2 for some queries in this implementation because of type casts.

Query	DB2	PostgreSQL
Q1	22.11 (1.37)	19.89 (0.30)
Q2	41.44 (5.15)	80.22 (2.07)
Q3	2880.11 (1.63)	88.56 (1.52)
Q4	7 (0.33)	6.33 (0.44)
Q5	171.78 (5.93)	244.57 (7.85)
Q6	3186.22 (8.41)	35498.1 (20.70)
Q7	820.44 (3.48)	172.89 (4.63)
Q8	7993.67 (137.44)	13122 (28.67)
Q9	77.44 (2.81)	82.22 (2.74)
Q10	36.11 (1.04)	67.44 (4.15)
Q11	51.33 (1.78)	53 (1.33)
Q12	1484.67 (11.78)	79.67 (1.22)
Q13	439.88 (9.63)	19053.9 (91.63)
Q14	56.11 (2.37)	51.44 (1.81)
Q15	42.44 (1.48)	44.89 (1.96)
Q16	26.67 (0.11)	47.55 (0.52)
Q17	2230.89 (6.30)	1207.11 (9.30)
Q18	19521.8 (23.93)	2150.11 (7.37)
Q19	46.89 (0.37)	58.44 (0.81)
Q20	337.89 (0.96)	38 (1.67)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

DB2 configuration (Version: DB2 10.5.1)

Clone this wiki locally