-
Notifications
You must be signed in to change notification settings - Fork 14
Benchmarks
#Machine characteristics 132 G RAM, 64 core machine, 5 TB SSD drive.
#Software settings
- Intraparallelism was turned on (INTRA_PARALLEL = Yes in DBM CFG)
- Degree of parallelism set to ANY (DFT_DEGREE = ANY in DB CFG)
- Default query optimization class set to 7 (DFT_QUERYOPT = 7 in DB CFG)
- Number of secondary log files set to 200 (LOGSECOND=200 in DB CFG)
- Log file size set to 20,000 (LOGFILSIZ=20000 in DB CFG).
Known problems and workarounds:
- DB2 has a self tuning feature to automatically adjust its resource usage based on the machine characteristics and workloads. This works well, but the system needs several runs with a query workload before it performs well.
- DB2 has a known problem with statistics for URIs that are longer than 32 characters. URIs are long, and the statistics used by the optimizer get very confused when the URIs are truncated to 32 characters. We have typically reversed URIs (and their subsequent queries) to work around this problem.
##PostgreSQL configuration (Version 9.3.5) Edited the following settings in postgresql.conf:
- effective_cache_size = 96GB
- seq_page_cost = 1.0 # measured on an arbitrary scale
- random_page_cost = 1 # same scale as above
##Spark configuration (Version 1.2.1) Note: the code in Quetzal still supports 1.1.1 and not 1.2.1 -- code will be updated once docker support can be provided Edited the following in SPARK_HOME/conf/spark-env.sh
- export SPARK_EXECUTOR_MEMORY=50G
- export SPARK_WORKER_MEMORY=50G
** Note all tables are declared as Hive tables using the plain text file format. The fixed costs for querying we observe here might be high due to that fact. It may be the case that a Parquet or ORCFile compression will be more helpful in reducing fixed costs. It is also unclear if hive settings such as predicate pushdown can help. The numbers here provide a useful baseline measurement.
#LUBM 100M LUBM is a benchmark designed to test OWL reasoners, and tests whether systems can perform OWL-DL reasoning. Because the Quetzal libraries support only the OWL-QL profile, we expanded the LUBM queries using the ontology. Queries can be found here. The table below shows query performance in milliseconds averaged across 10 runs of the query (the first run is discarded, mean times are computed for the remaining 9 runs). Both systems (DB2 and PostgreSQL were given the opportunity to 'self tune' by warm up runs prior to this measurement. Standard deviations are in parentheses.
Query | DB2 | PostgreSQL | Spark |
---|---|---|---|
Q1 | 32.89 (0.63) | 41.67 (0.89) | 29481.3 (105.89) |
Q2 | 5612.44 (475.15) | 113917 (36.26) | 41324 (172.67) |
Q3 | 23 (0) | 24.56 (0.19) | 17840 (305.67) |
Q4 | 1152.78 (17.93) | 179.11 (4.37) | 154509 (1828.59) |
Q5 | 63.11 (1.04) | 62.67 (1.22) | 33745.1 (247.04) |
Q6 | 20447.9 (10.96) | 41203.3 (183.11) | 100533 (1568.04) |
Q7 | 32.11 (0.37) | 38.33 (0.44) | 31371.4 (426.82) |
Q8 | 337.44 (0.52) | 3628.44 (4.48) | 177521 (2236.41) |
Q9 | 6721.56 (56.81) | 156218 (148.59) | 60044.9 (491.96) |
Q10 | 23.67 (0.22) | 24.89 (0.30) | 18502.8 (327.26) |
Q13 | 77.56 (1.52) | 71.56 (2.19) | 45479.8 (1170.59) |
Q14 | 8300 (103) | 26107.2 (617.07) | 45727.6 (178.52) |
Queries 2 and 9 in PostgreSQL do not seem to use the indexes present. We are not experts in PostgreSQL tuning, but we did try very hard to have the query optimizer pick up the indexes, unsuccessfully. If there are any PostgreSQL experts out there who see an obvious problem with these queries please let us know.
#SP2B 100M SP2B is a benchmark about DBLP authors, articles and journals. We used the benchmark as is, but Q4 requires a cross product which simply does not work for any system, as we describe in SIGMOD 2013. Timeout was set in Postgresql on the connection to 20 minutes. SP2B is a complex benchmark requiring the manipulation of very large temp tables. Again, Postgresql's performance is poor on these queries but that may be a less optimal configuration for Postgresql. Any suggestions on how to improve its performance are appreciated. For Queries 2, 6 and 11 on Spark, there were errors in handling the SQL, or exceptions thrown in Spark code in query processing.
Query | DB2 | PostgreSQL | Spark |
---|---|---|---|
Q1 | 37.67 (0.56) | 52.22 (1.74) | 34143.7 (3528.56) |
Q2 | 140469 (126.30) | 462350 (1464.7) | - |
Q3a | 2277.11 (50.30) | 66134.2 (66.59) | 34636.4 (640.48) |
Q3b | 381 (7.67) | 62195.2 (113.93) | 17890.7 (974.89) |
Q3c | 4471.56 (27.81) | 61915.2 (30.41) | 20199.6 (313.82) |
Q5a | 46961.2 (75.74) | Timeout | 191505 (2264.67) |
Q5b | 23263.7 (44.89) | 713639 (44.48) | 168310 (1425.78) |
Q6 | 688255 (9324.52) | 1.095e+06 (214.48) | - |
Q7 | 71609.2 (952.41) | Timeout | 793350 (596.56) |
Q8 | 431.67 (4.11) | 9174.89 (9.30) | 188215 (5209.15) |
Q9 | 42361.3 (190.22) | Timeout | 578510 (1438.78) |
Q10 | 31 (0.67) | 11094.2 (14.07) | 133495 (4749.04) |
Q11 | 7381 (4) | 17418.4 (177.52) | - |
Q12a | 74.22 (0.41) | 14854.8 (83.59) | 220484 (4512.15) |
Q12b | 120.33 (1.78) | 187.44 (2.15) | 175138 (4946.52) |
Q12c | 16.55 (0.19) | 25.89 (0.96) | 21438.7 (2455.56) |
#DBpedia. We used the DBpedia SPARQL benchmark with the DBpedia 3.7 dataset. Because the original queries are really templates that were applied to the dataset, we have included the queries used in our benchmark here. Note that the implementation of type casts was done differently from the SIGMOD 2013 paper -- here we use XML typecasting functions packaged in DB2 and PostgreSQL for greater generality across different backends. There is worse performance on DB2 for some queries in this implementation because of type casts.
Query | DB2 | PostgreSQL |
---|---|---|
Q1 | 22.11 (1.37) | 19.89 (0.30) |
Q2 | 41.44 (5.15) | 80.22 (2.07) |
Q3 | 2880.11 (1.63) | 88.56 (1.52) |
Q4 | 7 (0.33) | 6.33 (0.44) |
Q5 | 171.78 (5.93) | 244.57 (7.85) |
Q6 | 3186.22 (8.41) | 35498.1 (20.70) |
Q7 | 820.44 (3.48) | 172.89 (4.63) |
Q8 | 7993.67 (137.44) | 13122 (28.67) |
Q9 | 77.44 (2.81) | 82.22 (2.74) |
Q10 | 36.11 (1.04) | 67.44 (4.15) |
Q11 | 51.33 (1.78) | 53 (1.33) |
Q12 | 1484.67 (11.78) | 79.67 (1.22) |
Q13 | 439.88 (9.63) | 19053.9 (91.63) |
Q14 | 56.11 (2.37) | 51.44 (1.81) |
Q15 | 42.44 (1.48) | 44.89 (1.96) |
Q16 | 26.67 (0.11) | 47.55 (0.52) |
Q17 | 2230.89 (6.30) | 1207.11 (9.30) |
Q18 | 19521.8 (23.93) | 2150.11 (7.37) |
Q19 | 46.89 (0.37) | 58.44 (0.81) |
Q20 | 337.89 (0.96) | 38 (1.67) |