Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add compact DB mode (--compact-db) to de-duplicate mbtiles output #219

Merged
merged 10 commits into from
May 24, 2022
Merged

add compact DB mode (--compact-db) to de-duplicate mbtiles output #219

merged 10 commits into from
May 24, 2022

Conversation

bbilger
Copy link
Contributor

@bbilger bbilger commented May 9, 2022

Context

this addresses point 2 of #167 (comment)

Overview

the compact DB mode splits the tiles table into tiles_shallow and tiles_data

  • tiles_shallow contains the coordinates plus a reference on the data ID
  • tiles_data contains the data ID plus the actual tile data

this allows to de-duplicate content since multiple tiles can reference the same data

in this mode, tiles is realized as a view that joins the two tables tiles_shallow and tiles_data

Impact

  • it can significantly de-crease the mbtiles file size iff there's many duplicates (ocean tiles) - in case of Australia: 3.1GB -> 1.7GB
  • it has little to no impact on writing mbtiles (if no dupes it's slower, and the more dupes the faster but not significantly)
  • reading from the mbtiles is 10-15% slower because of the indirection over the view

some more insights can be found here: PR-impact.txt

Concept

The main idea to realize this is to create a hash over the tile data, and build a map between that hash and a data ID.

Failed Approaches

I was a bit surprised that the impact on writing the mbtiles is little to none. The only explanation I have is that we have to do more inserts than before (tiles_shallow, tiles_data). I tried to offload this "dual insert" from code to the DB. So in planetiler I produced just one insert but the insert was into a view - which then either just inserted into tiles_shallow or both tiles_shallow and tiles_data. This, however, increased the overall mbtiles generation time by almost 20%.

Next

I'd like to address point 1 in #167 (comment) either in this or a follow-up PR

@github-actions
Copy link

github-actions bot commented May 9, 2022

Base 6873b98 This Branch bb5f4c6
0:01:59 DEB [mbtiles] - Tile stats:
0:01:59 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:01:59 DEB [mbtiles] - z1 avg:4k max:4k
0:01:59 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:01:59 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:01:59 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:01:59 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:01:59 DEB [mbtiles] - z6 avg:1.4k max:24k
0:01:59 DEB [mbtiles] - z7 avg:1k max:56k
0:01:59 DEB [mbtiles] - z8 avg:476 max:113k
0:01:59 DEB [mbtiles] - z9 avg:298 max:279k
0:01:59 DEB [mbtiles] - z10 avg:165 max:233k
0:01:59 DEB [mbtiles] - z11 avg:108 max:132k
0:01:59 DEB [mbtiles] - z12 avg:86 max:119k
0:01:59 DEB [mbtiles] - z13 avg:72 max:109k
0:01:59 DEB [mbtiles] - z14 avg:68 max:257k
0:01:59 DEB [mbtiles] - all avg:71 max:0
0:01:59 DEB [mbtiles] -  # features: 5,289,610
0:01:59 DEB [mbtiles] -     # tiles: 4,115,450
0:01:59 INF [mbtiles] - Finished in 30s cpu:54s gc:1s avg:1.8
0:01:59 INF [mbtiles] -   read    1x(7% 2s wait:24s)
0:01:59 INF [mbtiles] -   encode  2x(45% 13s wait:7s)
0:01:59 INF [mbtiles] -   write   1x(35% 10s sys:1s wait:16s)
0:02:00 INF - Finished in 2m cpu:3m30s gc:4s avg:1.8
0:02:00 INF - FINISHED!
0:02:00 INF - 
0:02:00 INF - ----------------------------------------
0:02:00 INF - 	overall          2m cpu:3m30s gc:4s avg:1.8
0:02:00 INF - 	lake_centerlines 2s cpu:4s avg:1.7
0:02:00 INF - 	  read     1x(79% 2s)
0:02:00 INF - 	  process  2x(12% 0.3s wait:2s)
0:02:00 INF - 	  write    1x(0% 0s wait:2s)
0:02:00 INF - 	water_polygons   27s cpu:46s gc:2s avg:1.7
0:02:00 INF - 	  read     1x(60% 16s sys:1s wait:2s)
0:02:00 INF - 	  process  2x(27% 7s wait:14s)
0:02:00 INF - 	  write    1x(4% 0.9s wait:26s)
0:02:00 INF - 	natural_earth    10s cpu:16s avg:1.5
0:02:00 INF - 	  read     1x(89% 9s sys:1s)
0:02:00 INF - 	  process  2x(18% 2s wait:9s)
0:02:00 INF - 	  write    1x(0% 0s wait:10s)
0:02:00 INF - 	osm_pass1        4s cpu:7s avg:1.8
0:02:00 INF - 	  read     1x(2% 0.1s wait:4s)
0:02:00 INF - 	  parse    1x(57% 2s wait:1s)
0:02:00 INF - 	  process  1x(52% 2s wait:2s)
0:02:00 INF - 	osm_pass2        34s cpu:1m6s avg:2
0:02:00 INF - 	  read     1x(0% 0s wait:18s done:16s)
0:02:00 INF - 	  process  2x(74% 25s)
0:02:00 INF - 	  write    1x(1% 0.4s wait:33s)
0:02:00 INF - 	boundaries       0s cpu:0.1s avg:1.5
0:02:00 INF - 	sort             4s cpu:5s avg:1.2
0:02:00 INF - 	  worker  1x(85% 4s)
0:02:00 INF - 	mbtiles          30s cpu:54s gc:1s avg:1.8
0:02:00 INF - 	  read    1x(7% 2s wait:24s)
0:02:00 INF - 	  encode  2x(45% 13s wait:7s)
0:02:00 INF - 	  write   1x(35% 10s sys:1s wait:16s)
0:02:00 INF - ----------------------------------------
0:02:00 INF - 	features	269MB
0:02:00 INF - 	mbtiles	515MB
-rw-r--r-- 1 runner docker 55M May 24 17:36 run.jar
0:02:01 DEB [mbtiles] - Tile stats:
0:02:01 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:02:01 DEB [mbtiles] - z1 avg:4k max:4k
0:02:01 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:02:01 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:02:01 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:02:01 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:02:01 DEB [mbtiles] - z6 avg:1.4k max:24k
0:02:01 DEB [mbtiles] - z7 avg:1k max:56k
0:02:01 DEB [mbtiles] - z8 avg:476 max:113k
0:02:01 DEB [mbtiles] - z9 avg:298 max:279k
0:02:01 DEB [mbtiles] - z10 avg:165 max:233k
0:02:01 DEB [mbtiles] - z11 avg:108 max:132k
0:02:01 DEB [mbtiles] - z12 avg:86 max:119k
0:02:01 DEB [mbtiles] - z13 avg:72 max:109k
0:02:01 DEB [mbtiles] - z14 avg:68 max:257k
0:02:01 DEB [mbtiles] - all avg:71 max:0
0:02:01 DEB [mbtiles] -  # features: 5,289,610
0:02:01 DEB [mbtiles] -     # tiles: 4,115,450
0:02:01 INF [mbtiles] - Finished in 29s cpu:52s avg:1.8
0:02:01 INF [mbtiles] -   read    1x(7% 2s wait:24s)
0:02:01 INF [mbtiles] -   encode  2x(47% 13s wait:7s)
0:02:01 INF [mbtiles] -   write   1x(36% 10s sys:1s wait:16s)
0:02:01 INF - Finished in 2m2s cpu:3m28s gc:4s avg:1.7
0:02:01 INF - FINISHED!
0:02:01 INF - 
0:02:01 INF - ----------------------------------------
0:02:01 INF - 	overall          2m2s cpu:3m28s gc:4s avg:1.7
0:02:01 INF - 	lake_centerlines 2s cpu:4s avg:1.8
0:02:01 INF - 	  read     1x(85% 2s)
0:02:01 INF - 	  process  2x(13% 0.3s wait:2s)
0:02:01 INF - 	  write    1x(0% 0s wait:2s)
0:02:01 INF - 	water_polygons   27s cpu:47s gc:2s avg:1.8
0:02:01 INF - 	  read     1x(59% 16s wait:3s)
0:02:01 INF - 	  process  2x(28% 7s wait:12s)
0:02:01 INF - 	  write    1x(4% 1s wait:25s)
0:02:01 INF - 	natural_earth    12s cpu:17s avg:1.4
0:02:01 INF - 	  read     1x(80% 9s sys:1s done:1s)
0:02:01 INF - 	  process  2x(16% 2s wait:10s done:1s)
0:02:01 INF - 	  write    1x(0% 0s wait:11s done:1s)
0:02:01 INF - 	osm_pass1        4s cpu:8s avg:1.8
0:02:01 INF - 	  read     1x(3% 0.1s wait:4s)
0:02:01 INF - 	  parse    1x(58% 2s wait:1s)
0:02:01 INF - 	  process  1x(53% 2s wait:1s)
0:02:01 INF - 	osm_pass2        32s cpu:1m3s avg:2
0:02:01 INF - 	  read     1x(0% 0.1s wait:16s done:16s)
0:02:01 INF - 	  process  2x(77% 25s)
0:02:01 INF - 	  write    1x(1% 0.4s wait:31s)
0:02:01 INF - 	boundaries       0.1s cpu:0.1s avg:1.5
0:02:01 INF - 	sort             5s cpu:6s avg:1.2
0:02:01 INF - 	  worker  1x(73% 4s)
0:02:01 INF - 	mbtiles          29s cpu:52s avg:1.8
0:02:01 INF - 	  read    1x(7% 2s wait:24s)
0:02:01 INF - 	  encode  2x(47% 13s wait:7s)
0:02:01 INF - 	  write   1x(36% 10s sys:1s wait:16s)
0:02:01 INF - ----------------------------------------
0:02:01 INF - 	features	269MB
0:02:01 INF - 	mbtiles	515MB
-rw-r--r-- 1 runner docker 55M May 24 17:34 run.jar

https://github.com/onthegomap/planetiler/actions/runs/2379534492

ℹ️ Base Logs 6873b98
0:00:00 DEB - argument: config=null (path to config file)
0:00:00 DEB - argument: area=rhode island (name of the extract to download if osm_url/osm_path not specified (i.e. 'monaco' 'rhode island' 'australia' or 'planet'))
0:00:00 INF - Using in-memory stats
0:00:00 INF [overall] - 
0:00:00 INF [overall] - Starting...
0:00:00 DEB - argument: madvise=true (default value for whether to use linux madvise(random) to improve memory-mapped read performance for temporary storage)
0:00:00 DEB - argument: storage=mmap (default storage type for temporary data, one of [ram, mmap, direct])
0:00:00 DEB - argument: threads=2 (num threads)
0:00:00 DEB - argument: write_threads=1 (number of threads to use when writing temp features)
0:00:00 DEB - argument: process_threads=2 (number of threads to use when processing input features)
0:00:00 DEB - argument: bounds=Env[-74.07 : -17.84, 21.34 : 43.55] (bounds)
0:00:00 DEB - argument: loginterval=10 seconds (time between logs)
0:00:00 DEB - argument: minzoom=0 (minimum zoom level)
0:00:00 DEB - argument: maxzoom=14 (maximum zoom level (limit 14))
0:00:00 DEB - argument: defer_mbtiles_index_creation=false (skip adding index to mbtiles file)
0:00:00 DEB - argument: optimize_db=false (optimize mbtiles after writing)
0:00:00 DEB - argument: emit_tiles_in_order=true (emit tiles in index order)
0:00:00 DEB - argument: force=false (overwriting output file and ignore disk/RAM warnings)
0:00:00 DEB - argument: gzip_temp=false (gzip temporary feature storage (uses more CPU, but less disk space))
0:00:00 DEB - argument: mmap_temp=false (use memory-mapped IO for temp feature files)
0:00:00 DEB - argument: sort_max_readers=6 (maximum number of concurrent read threads to use when sorting chunks)
0:00:00 DEB - argument: sort_max_writers=6 (maximum number of concurrent write threads to use when sorting chunks)
0:00:00 DEB - argument: nodemap_type=sparsearray (type of node location map, one of [noop, sortedtable, sparsearray, array])
0:00:00 DEB - argument: nodemap_storage=mmap (storage for node location map, one of [ram, mmap, direct])
0:00:00 DEB - argument: nodemap_madvise=true (use linux madvise(random) for node locations)
0:00:00 DEB - argument: multipolygon_geometry_storage=mmap (storage for multipolygon geometries, one of [ram, mmap, direct])
0:00:00 DEB - argument: multipolygon_geometry_madvise=true (use linux madvise(random) for temporary multipolygon geometry storage)
0:00:00 DEB - argument: http_user_agent=Planetiler downloader (https://github.com/onthegomap/planetiler) (User-Agent header to set when downloading files over HTTP)
0:00:00 DEB - argument: http_timeout=30 seconds (Timeout to use when downloading files over HTTP)
0:00:00 DEB - argument: http_retries=1 (Retries to use when downloading files over HTTP)
0:00:00 DEB - argument: download_chunk_size_mb=100 (Size of file chunks to download in parallel in megabytes)
0:00:00 DEB - argument: download_threads=1 (Number of parallel threads to use when downloading each file)
0:00:00 DEB - argument: min_feature_size_at_max_zoom=0.0625 (Default value for the minimum size in tile pixels of features to emit at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: min_feature_size=1.0 (Default value for the minimum size in tile pixels of features to emit below the maximum zoom level)
0:00:00 DEB - argument: simplify_tolerance_at_max_zoom=0.0625 (Default value for the tile pixel tolerance to use when simplifying features at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: simplify_tolerance=0.1 (Default value for the tile pixel tolerance to use when simplifying features below the maximum zoom level)
0:00:00 DEB - argument: osm_lazy_reads=false (Read OSM blocks from disk in worker threads)
0:00:00 DEB - argument: tmpdir=data/tmp (temp directory)
0:00:00 DEB - argument: only_download=false (download source data then exit)
0:00:00 DEB - argument: download=false (download sources)
0:00:00 DEB - argument: temp_nodes=data/tmp/node.db (temp node db location)
0:00:00 DEB - argument: temp_multipolygons=data/tmp/multipolygon.db (temp multipolygon db location)
0:00:00 DEB - argument: temp_features=data/tmp/feature.db (temp feature db location)
0:00:00 DEB - argument: only_fetch_wikidata=false (fetch wikidata translations then quit)
0:00:00 DEB - argument: fetch_wikidata=false (fetch wikidata translations then continue)
0:00:00 DEB - argument: use_wikidata=true (use wikidata translations)
0:00:00 DEB - argument: wikidata_cache=data/sources/wikidata_names.json (wikidata cache file)
0:00:00 DEB - argument: lake_centerlines_path=data/sources/lake_centerline.shp.zip (lake_centerlines shapefile path)
0:00:00 DEB - argument: free_lake_centerlines_after_read=false (delete lake_centerlines input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: water_polygons_path=data/sources/water-polygons-split-3857.zip (water_polygons shapefile path)
0:00:00 DEB - argument: free_water_polygons_after_read=false (delete water_polygons input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_path=data/sources/natural_earth_vector.sqlite.zip (natural_earth sqlite db path)
0:00:00 DEB - argument: free_natural_earth_after_read=false (delete natural_earth input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: osm_path=data/sources/rhode_island.osm.pbf (osm OSM input file path)
0:00:00 DEB - argument: free_osm_after_read=false (delete osm input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: mbtiles=data/out.mbtiles (mbtiles output file)
0:00:00 DEB - argument: transliterate=true (attempt to transliterate latin names)
0:00:00 DEB - argument: languages=am,ar,az,be,bg,br,bs,ca,co,cs,cy,da,de,el,en,eo,es,et,eu,fi,fr,fy,ga,gd,he,hi,hr,hu,hy,id,is,it,ja,ja_kana,ja_rm,ja-Latn,ja-Hira,ka,kk,kn,ko,ko-Latn,ku,la,lb,lt,lv,mk,mt,ml,nl,no,oc,pl,pt,rm,ro,ru,sk,sl,sq,sr,sr-Latn,sv,ta,te,th,tr,uk,zh (languages to use)
0:00:00 DEB - argument: only_layers= (Include only certain layers)
0:00:00 DEB - argument: exclude_layers= (Exclude certain layers)
0:00:00 DEB - argument: boundary_country_names=true (boundary layer: add left/right codes of neighboring countries)
0:00:00 DEB - argument: transportation_z13_paths=false (transportation(_name) layer: show all paths on z13)
0:00:01 DEB - argument: building_merge_z13=true (building layer: merge nearby buildings at z13)
0:00:01 DEB - argument: transportation_name_brunnel=false (transportation_name layer: set to false to omit brunnel and help merge long highways)
0:00:01 DEB - argument: transportation_name_size_for_shield=false (transportation_name layer: allow road names on shorter segments (ie. they will have a shield))
0:00:01 DEB - argument: transportation_name_limit_merge=false (transportation_name layer: limit merge so we don't combine different relations to help merge long highways)
0:00:01 DEB - argument: transportation_name_minor_refs=false (transportation_name layer: include name and refs from minor road networks if not present on a way)
0:00:01 DEB - argument: mbtiles_name=OpenMapTiles ('name' attribute for mbtiles metadata)
0:00:01 DEB - argument: mbtiles_description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org ('description' attribute for mbtiles metadata)
0:00:01 DEB - argument: mbtiles_attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a> ('attribution' attribute for mbtiles metadata)
0:00:01 DEB - argument: mbtiles_version=3.13.1 ('version' attribute for mbtiles metadata)
0:00:01 DEB - argument: mbtiles_type=baselayer ('type' attribute for mbtiles metadata)
0:00:01 DEB - argument: help=false (show arguments then exit)
0:00:01 INF - Building BasemapProfile profile into data/out.mbtiles in these phases:
0:00:01 INF -   lake_centerlines: Process features in data/sources/lake_centerline.shp.zip
0:00:01 INF -   water_polygons: Process features in data/sources/water-polygons-split-3857.zip
0:00:01 INF -   natural_earth: Process features in data/sources/natural_earth_vector.sqlite.zip
0:00:01 INF -   osm_pass1: Pre-process OpenStreetMap input (store node locations then relation members)
0:00:01 INF -   osm_pass2: Process OpenStreetMap nodes, ways, then relations
0:00:01 INF -   sort: Sort rendered features by tile ID
0:00:01 INF -   mbtiles: Encode each tile and write to data/out.mbtiles
0:00:01 INF - no wikidata translations found, run with --fetch-wikidata to download
0:00:01 DEB - ✓ 194M storage on / (/dev/root) requested for read phase disk, 30G available
0:00:01 DEB -  - 43M used for temporary node location cache
0:00:01 DEB -  - 6.6M used for temporary multipolygon geometry cache
0:00:01 DEB -  - 144M used for temporary feature storage
0:00:01 DEB - ✓ 216M storage on / (/dev/root) requested for write phase disk, 30G available
0:00:01 DEB -  - 144M used for temporary feature storage
0:00:01 DEB -  - 72M used for mbtiles output
0:00:01 DEB - ✓ 312M JVM heap requested for read phase, 4.2G available
0:00:01 DEB -  - 300M used for sparsearray node location in-memory index
0:00:01 DEB -  - 12M used for temporary profile storage
0:00:01 DEB - ✓ 50M storage on / (/dev/root) requested for read phase, 30G available
0:00:01 DEB -  - 43M used for sparsearray node location cache
0:00:01 DEB -  - 6.6M used for multipolygon way geometries
0:00:01 DEB - ✓ 50M temporary files and 2.9G of free memory for OS to cache them
0:00:01 INF - Using merge sort feature map, chunk size=1431mb max workers=2
0:00:02 INF - dataFileCache open start
0:00:02 INF [lake_centerlines] - 
0:00:02 INF [lake_centerlines] - Starting...
0:00:05 INF [lake_centerlines] -  read: [  59k 100%  25k/s ] write: [    0    0/s ] 0    
    cpus: 1.7 gc:  4% heap: 173M/4.2G direct: 237k postGC: 80M
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:05 INF [lake_centerlines] - Finished in 2s cpu:4s avg:1.7
0:00:05 INF [lake_centerlines] -   read     1x(79% 2s)
0:00:05 INF [lake_centerlines] -   process  2x(12% 0.3s wait:2s)
0:00:05 INF [lake_centerlines] -   write    1x(0% 0s wait:2s)
0:00:05 INF [water_polygons] - 
0:00:05 INF [water_polygons] - Starting...
0:00:15 INF [water_polygons] -  read: [ 2.1k  15%  213/s ] write: [  41k   4k/s ] 2.6M 
    cpus: 1.9 gc: 11% heap: 1.1G/4.2G direct: 52M postGC: 988M
    read(58%) ->    (0/1k) -> process(16% 36%) -> (1.8k/53k) -> write( 0%)
0:00:25 INF [water_polygons] -  read: [ 4.9k  34%  283/s ] write: [ 299k  25k/s ] 15M  
    cpus: 1.7 gc:  7% heap: 2.4G/4.2G direct: 52M postGC: 1.3G
    read(67%) ->    (0/1k) -> process(25% 28%) -> (1.2k/53k) -> write( 1%)
0:00:31 INF [water_polygons] -  read: [  14k 100% 1.4k/s ] write: [ 4.3M 606k/s ] 186M 
    cpus: 1.5 gc:  7% heap: 2.6G/4.2G direct: 52M postGC: 1.5G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:31 INF [water_polygons] - Finished in 27s cpu:46s gc:2s avg:1.7
0:00:31 INF [water_polygons] -   read     1x(60% 16s sys:1s wait:2s)
0:00:31 INF [water_polygons] -   process  2x(27% 7s wait:14s)
0:00:31 INF [water_polygons] -   write    1x(4% 0.9s wait:26s)
0:00:31 INF [natural_earth] - unzipping /home/runner/work/planetiler/planetiler/data/sources/natural_earth_vector.sqlite.zip to data/tmp/natearth.sqlite
0:00:37 INF [natural_earth] - 
0:00:37 INF [natural_earth] - Starting...
0:00:47 INF [natural_earth] -  read: [ 349k 100%  35k/s ] write: [  181   18/s ] 186M 
    cpus: 1.6 gc:  0% heap: 1.5G/4.2G direct: 52M postGC: 1.5G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:47 INF [natural_earth] - Finished in 10s cpu:16s avg:1.5
0:00:47 INF [natural_earth] -   read     1x(89% 9s sys:1s)
0:00:47 INF [natural_earth] -   process  2x(18% 2s wait:9s)
0:00:47 INF [natural_earth] -   write    1x(0% 0s wait:10s)
0:00:48 INF [osm_pass1] - 
0:00:48 INF [osm_pass1] - Starting...
0:00:51 INF [osm_pass1:process] - Finished nodes: 4,590,530 (1.4M/s) in 3s cpu:5s avg:1.8
0:00:52 INF [osm_pass1:process] - Finished ways: 330,172 (387k/s) in 0.9s cpu:2s avg:2
0:00:52 INF [osm_pass1:process] - Finished relations: 7,697 (120k/s) in 0.1s cpu:0.1s avg:2
0:00:52 INF [osm_pass1] -  nodes: [ 4.5M 1.1M/s ] 414M  ways: [ 330k  80k/s ] rels: [ 7.6k 1.8k/s ] blocks: [  617  150/s ]
    cpus: 1.8 gc:  0% heap: 2.5G/4.2G direct: 52M postGC: 1.9G hppc: 879k
    read( -%) ->     (0/4) -> parse( -%) ->     (0/4) -> process( -%)
0:00:52 DEB [osm_pass1] - Processed 617 blocks:
0:00:52 DEB [osm_pass1] -   nodes: 4,590,530 (1.4M/s) in 3s cpu:5s avg:1.8
0:00:52 DEB [osm_pass1] -   ways: 330,172 (387k/s) in 0.9s cpu:2s avg:2
0:00:52 DEB [osm_pass1] -   relations: 7,697 (120k/s) in 0.1s cpu:0.1s avg:2
0:00:52 INF [osm_pass1] - Finished in 4s cpu:7s avg:1.8
0:00:52 INF [osm_pass1] -   read     1x(2% 0.1s wait:4s)
0:00:52 INF [osm_pass1] -   parse    1x(57% 2s wait:1s)
0:00:52 INF [osm_pass1] -   process  1x(52% 2s wait:2s)
0:00:52 INF [osm_pass2] - 
0:00:52 INF [osm_pass2] - Starting...
0:00:54 DEB [osm_pass2:process] - Sorting long long multimap...
0:00:54 DEB [osm_pass2:process] - Sorted long long multimap 0s cpu:0s avg:1.8
0:00:54 INF [osm_pass2:process] - Finished nodes: 4,590,530 (1.8M/s) in 2s cpu:5s avg:2
0:00:54 WAR [osm_pass2:process] - No GB polygon for inferring route network types
0:01:02 INF [osm_pass2] -  nodes: [ 4.5M 100% 458k/s ] 414M  ways: [  99k  30% 9.9k/s ] rels: [    0   0%    0/s ] features: [ 4.7M  44k/s ] 220M  blocks: [  586  95%   58/s ]
    cpus: 2 gc:  0% heap: 2G/4.2G direct: 52M postGC: 830M relInfo: 825k mpGeoms: 524k 
    read( 0%) ->   (11/13) -> process(62% 63%) -> (389/53k) -> write( 2%)
0:01:12 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 414M  ways: [ 277k  84%  17k/s ] rels: [    0   0%    0/s ] features: [ 5.1M  36k/s ] 247M  blocks: [  608  99%    2/s ]
    cpus: 2 gc:  1% heap: 909M/4.2G direct: 52M postGC: 843M relInfo: 825k mpGeoms: 17M  
    read( -%) ->    (7/13) -> process(83% 80%) ->  (1k/53k) -> write( 1%)
0:01:14 INF [osm_pass2:process] - Finished ways: 330,172 (16k/s) in 20s cpu:39s avg:2
0:01:22 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 414M  ways: [ 330k 100% 5.2k/s ] rels: [ 5.7k  75%  578/s ] features: [ 5.2M  14k/s ] 264M  blocks: [  616 100%   <1/s ]
    cpus: 2 gc:  0% heap: 2.8G/4.2G direct: 52M postGC: 834M relInfo: 825k mpGeoms: 19M  
    read( -%) ->    (0/13) -> process(73% 79%) -> (633/53k) -> write( 1%)
0:01:23 INF [osm_pass2:process] - Finished relations: 7,697 (883/s) in 9s cpu:17s avg:2
0:01:25 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 414M  ways: [ 330k 100%    0/s ] rels: [ 7.6k 100%  544/s ] features: [ 5.2M 2.9k/s ] 269M  blocks: [  617 100%   <1/s ]
    cpus: 1.9 gc:  1% heap: 2.7G/4.2G direct: 52M postGC: 837M relInfo: 825k mpGeoms: 19M  
    read( -%) ->    (0/13) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:25 DEB [osm_pass2] - Processed 617 blocks:
0:01:25 DEB [osm_pass2] -   nodes: 4,590,530 (1.8M/s) in 2s cpu:5s avg:2
0:01:25 DEB [osm_pass2] -   ways: 330,172 (16k/s) in 20s cpu:39s avg:2
0:01:25 DEB [osm_pass2] -   relations: 7,697 (883/s) in 9s cpu:17s avg:2
0:01:25 INF [osm_pass2] - Finished in 34s cpu:1m6s avg:2
0:01:25 INF [osm_pass2] -   read     1x(0% 0s wait:18s done:16s)
0:01:25 INF [osm_pass2] -   process  2x(74% 25s)
0:01:25 INF [osm_pass2] -   write    1x(1% 0.4s wait:33s)
0:01:25 INF [boundaries] - 
0:01:25 INF [boundaries] - Starting...
0:01:25 INF [boundaries] - Creating polygons for 1 boundaries
0:01:25 WAR [boundaries] - Unable to form closed polygon for OSM relation 148838 (likely missing edges)
0:01:25 INF [boundaries] - Finished creating 0 country polygons
0:01:25 INF [boundaries] - Finished in 0s cpu:0.1s avg:1.5
0:01:25 INF - Deleting node.db to make room for output file
0:01:25 INF [sort] - 
0:01:25 INF [sort] - Starting...
0:01:25 INF [sort] - Grouped 8 chunks into 1
0:01:29 INF [sort] -  chunks: [   1 /   1 100% ] 269M 
    cpus: 1.2 gc:  0% heap: 3.2G/4.2G direct: 52M postGC: 837M
    ->     (0/3) -> worker(86%)
0:01:29 INF [sort] - Finished in 4s cpu:5s avg:1.2
0:01:29 INF [sort] -   worker  1x(85% 4s)
0:01:29 INF - read:1s write:1s sort:1s
0:01:30 INF [mbtiles] - 
0:01:30 INF [mbtiles] - Starting...
0:01:30 DEB [mbtiles:write] - Execute mbtiles: create table metadata (name text, value text);
0:01:30 DEB [mbtiles:write] - Execute mbtiles: create unique index name on metadata (name);
0:01:30 DEB [mbtiles:write] - Execute mbtiles: create table tiles (zoom_level integer, tile_column integer, tile_row, tile_data blob);
0:01:30 DEB [mbtiles:write] - Execute mbtiles: create unique index tile_index on tiles (zoom_level, tile_column, tile_row)
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: name=OpenMapTiles
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: format=pbf
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a>
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: version=3.13.1
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: type=baselayer
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: bounds=-74.07,21.34,-17.84,43.55
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: center=-45.955,32.445,3
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: minzoom=0
0:01:30 DEB [mbtiles:write] - Set mbtiles metadata: maxzoom=14
0:01:31 DEB [mbtiles:write] - Set mbtiles metadata: json={"vector_layers":[{"id":"aerodrome_label","fields":{"name_int":"String","iata":"String","ele_ft":"Number","name_de":"String","name":"String","icao":"String","name:en":"String","class":"String","ele":"Number","name_en":"String","name:latin":"String"},"minzoom":10,"maxzoom":14},{"id":"aeroway","fields":{"ref":"String","class":"String"},"minzoom":10,"maxzoom":14},{"id":"boundary","fields":{"disputed":"Number","admin_level":"Number","maritime":"Number","disputed_name":"String"},"minzoom":0,"maxzoom":14},{"id":"building","fields":{"colour":"String","render_height":"Number","render_min_height":"Number"},"minzoom":13,"maxzoom":14},{"id":"housenumber","fields":{"housenumber":"String"},"minzoom":14,"maxzoom":14},{"id":"landcover","fields":{"subclass":"String","class":"String","_numpoints":"Number"},"minzoom":7,"maxzoom":14},{"id":"landuse","fields":{"class":"String"},"minzoom":4,"maxzoom":14},{"id":"mountain_peak","fields":{"name_int":"String","customary_ft":"Number","ele_ft":"Number","name_de":"String","name":"String","rank":"Number","class":"String","name_en":"String","name:latin":"String","ele":"Number"},"minzoom":7,"maxzoom":14},{"id":"park","fields":{"name_int":"String","name_de":"String","name":"String","name:en":"String","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"place","fields":{"name:fy":"String","name_int":"String","capital":"Number","name:uk":"String","name:pl":"String","name:nl":"String","name:be":"String","name:ru":"String","name:ko":"String","name_de":"String","name":"String","rank":"Number","name:en":"String","name:eo":"String","class":"String","name:hu":"String","name:ta":"String","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":2,"maxzoom":14},{"id":"poi","fields":{"name_int":"String","level":"Number","name_de":"String","name":"String","subclass":"String","indoor":"Number","name:en":"String","class":"String","layer":"Number","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":12,"maxzoom":14},{"id":"transportation","fields":{"access":"String","brunnel":"String","expressway":"Number","surface":"String","bicycle":"String","level":"Number","ramp":"Number","mtb_scale":"String","toll":"Number","oneway":"Number","layer":"Number","network":"String","horse":"String","service":"String","subclass":"String","class":"String","foot":"String"},"minzoom":4,"maxzoom":14},{"id":"transportation_name","fields":{"name_int":"String","route_4":"String","route_3":"String","route_2":"String","route_1":"String","layer":"Number","network":"String","ref":"String","name_de":"String","name":"String","subclass":"String","ref_length":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"water","fields":{"intermittent":"Number","id":"Number","class":"String"},"minzoom":0,"maxzoom":14},{"id":"water_name","fields":{"name_int":"String","name_de":"String","name":"String","intermittent":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":9,"maxzoom":14},{"id":"waterway","fields":{"name_int":"String","brunnel":"String","name_de":"String","_relid":"Number","intermittent":"Number","name":"String","class":"String","name:latin":"String","name_en":"String"},"minzoom":4,"maxzoom":14}]}
0:01:31 INF [mbtiles:write] - Starting z0
0:01:31 INF [mbtiles:write] - Finished z0 in 0s cpu:0s avg:0, now starting z1
0:01:31 INF [mbtiles:write] - Finished z1 in 0s cpu:0s avg:0, now starting z2
0:01:31 INF [mbtiles:write] - Finished z2 in 0s cpu:0s avg:0, now starting z3
0:01:31 INF [mbtiles:write] - Finished z3 in 0s cpu:0s avg:0, now starting z4
0:01:31 INF [mbtiles:write] - Finished z4 in 0s cpu:0s avg:0, now starting z5
0:01:31 INF [mbtiles:write] - Finished z5 in 0s cpu:0s avg:0, now starting z6
0:01:31 INF [mbtiles:write] - Finished z6 in 0s cpu:0s avg:0, now starting z7
0:01:32 INF [mbtiles:write] - Finished z7 in 0.7s cpu:1s avg:2, now starting z8
0:01:33 INF [mbtiles:write] - Finished z8 in 0.9s cpu:2s avg:2, now starting z9
0:01:34 INF [mbtiles:write] - Finished z9 in 2s cpu:3s avg:2, now starting z10
0:01:34 INF [mbtiles:write] - Finished z10 in 0.1s cpu:0.1s avg:2, now starting z11
0:01:35 INF [mbtiles:write] - Finished z11 in 0.4s cpu:0.8s avg:2, now starting z12
0:01:37 INF [mbtiles:write] - Finished z12 in 3s cpu:5s avg:2, now starting z13
0:01:40 INF [mbtiles] -  features: [ 630k  12%  60k/s ] 269M  tiles: [ 290k  27k/s ] 45M  
    cpus: 2 gc: 10% heap: 2.1G/4.2G direct: 52M postGC: 832M
    read( 4%) -> (212/217) -> encode(55% 55%) -> (215/216) -> write( 8%)
    last tile: 13/2468/3042 (z13 4%) https://www.openstreetmap.org/#map=13/41.96766/-71.54297
0:01:50 INF [mbtiles] -  features: [ 1.7M  33% 113k/s ] 269M  tiles: [ 970k  67k/s ] 128M 
    cpus: 1.9 gc:  2% heap: 1.1G/4.2G direct: 52M postGC: 817M
    read( 4%) ->  (35/217) -> encode(55% 58%) -> (209/216) -> write(17%)
    last tile: 13/3593/3357 (z13 92%) https://www.openstreetmap.org/#map=13/30.86451/-22.10449
0:01:51 INF [mbtiles:write] - Finished z13 in 13s cpu:26s avg:2, now starting z14
0:01:59 INF [mbtiles:write] - Finished z14 in 9s cpu:14s avg:1.5
0:01:59 INF [mbtiles] -  features: [ 5.2M 100% 380k/s ] 269M  tiles: [ 4.1M 340k/s ] 515M 
    cpus: 1.5 gc:  1% heap: 1.9G/4.2G direct: 52M postGC: 817M
    read( -%) ->   (0/217) -> encode( -%  -%) ->   (0/216) -> write( -%)
    last tile: 14/7380/5985 (z14 100%) https://www.openstreetmap.org/#map=14/43.56447/-17.84180
0:01:59 DEB [mbtiles] - Tile stats:
0:01:59 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:01:59 DEB [mbtiles] - z1 avg:4k max:4k
0:01:59 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:01:59 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:01:59 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:01:59 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:01:59 DEB [mbtiles] - z6 avg:1.4k max:24k
0:01:59 DEB [mbtiles] - z7 avg:1k max:56k
0:01:59 DEB [mbtiles] - z8 avg:476 max:113k
0:01:59 DEB [mbtiles] - z9 avg:298 max:279k
0:01:59 DEB [mbtiles] - z10 avg:165 max:233k
0:01:59 DEB [mbtiles] - z11 avg:108 max:132k
0:01:59 DEB [mbtiles] - z12 avg:86 max:119k
0:01:59 DEB [mbtiles] - z13 avg:72 max:109k
0:01:59 DEB [mbtiles] - z14 avg:68 max:257k
0:01:59 DEB [mbtiles] - all avg:71 max:0
0:01:59 DEB [mbtiles] -  # features: 5,289,610
0:01:59 DEB [mbtiles] -     # tiles: 4,115,450
0:01:59 INF [mbtiles] - Finished in 30s cpu:54s gc:1s avg:1.8
0:01:59 INF [mbtiles] -   read    1x(7% 2s wait:24s)
0:01:59 INF [mbtiles] -   encode  2x(45% 13s wait:7s)
0:01:59 INF [mbtiles] -   write   1x(35% 10s sys:1s wait:16s)
0:02:00 INF - Finished in 2m cpu:3m30s gc:4s avg:1.8
0:02:00 INF - FINISHED!
0:02:00 INF - 
0:02:00 INF - ----------------------------------------
0:02:00 INF - 	overall          2m cpu:3m30s gc:4s avg:1.8
0:02:00 INF - 	lake_centerlines 2s cpu:4s avg:1.7
0:02:00 INF - 	  read     1x(79% 2s)
0:02:00 INF - 	  process  2x(12% 0.3s wait:2s)
0:02:00 INF - 	  write    1x(0% 0s wait:2s)
0:02:00 INF - 	water_polygons   27s cpu:46s gc:2s avg:1.7
0:02:00 INF - 	  read     1x(60% 16s sys:1s wait:2s)
0:02:00 INF - 	  process  2x(27% 7s wait:14s)
0:02:00 INF - 	  write    1x(4% 0.9s wait:26s)
0:02:00 INF - 	natural_earth    10s cpu:16s avg:1.5
0:02:00 INF - 	  read     1x(89% 9s sys:1s)
0:02:00 INF - 	  process  2x(18% 2s wait:9s)
0:02:00 INF - 	  write    1x(0% 0s wait:10s)
0:02:00 INF - 	osm_pass1        4s cpu:7s avg:1.8
0:02:00 INF - 	  read     1x(2% 0.1s wait:4s)
0:02:00 INF - 	  parse    1x(57% 2s wait:1s)
0:02:00 INF - 	  process  1x(52% 2s wait:2s)
0:02:00 INF - 	osm_pass2        34s cpu:1m6s avg:2
0:02:00 INF - 	  read     1x(0% 0s wait:18s done:16s)
0:02:00 INF - 	  process  2x(74% 25s)
0:02:00 INF - 	  write    1x(1% 0.4s wait:33s)
0:02:00 INF - 	boundaries       0s cpu:0.1s avg:1.5
0:02:00 INF - 	sort             4s cpu:5s avg:1.2
0:02:00 INF - 	  worker  1x(85% 4s)
0:02:00 INF - 	mbtiles          30s cpu:54s gc:1s avg:1.8
0:02:00 INF - 	  read    1x(7% 2s wait:24s)
0:02:00 INF - 	  encode  2x(45% 13s wait:7s)
0:02:00 INF - 	  write   1x(35% 10s sys:1s wait:16s)
0:02:00 INF - ----------------------------------------
0:02:00 INF - 	features	269MB
0:02:00 INF - 	mbtiles	515MB
-rw-r--r-- 1 runner docker 55M May 24 17:36 run.jar
ℹ️ This Branch Logs bb5f4c6
0:00:00 DEB - argument: config=null (path to config file)
0:00:00 DEB - argument: area=rhode island (name of the extract to download if osm_url/osm_path not specified (i.e. 'monaco' 'rhode island' 'australia' or 'planet'))
0:00:00 INF - Using in-memory stats
0:00:00 INF [overall] - 
0:00:00 INF [overall] - Starting...
0:00:00 DEB - argument: madvise=true (default value for whether to use linux madvise(random) to improve memory-mapped read performance for temporary storage)
0:00:00 DEB - argument: storage=mmap (default storage type for temporary data, one of [ram, mmap, direct])
0:00:00 DEB - argument: threads=2 (num threads)
0:00:00 DEB - argument: write_threads=1 (number of threads to use when writing temp features)
0:00:00 DEB - argument: process_threads=2 (number of threads to use when processing input features)
0:00:00 DEB - argument: bounds=Env[-74.07 : -17.84, 21.34 : 43.55] (bounds)
0:00:00 DEB - argument: loginterval=10 seconds (time between logs)
0:00:00 DEB - argument: minzoom=0 (minimum zoom level)
0:00:00 DEB - argument: maxzoom=14 (maximum zoom level (limit 14))
0:00:00 DEB - argument: defer_mbtiles_index_creation=false (skip adding index to mbtiles file)
0:00:00 DEB - argument: optimize_db=false (optimize mbtiles after writing)
0:00:00 DEB - argument: emit_tiles_in_order=true (emit tiles in index order)
0:00:00 DEB - argument: force=false (overwriting output file and ignore disk/RAM warnings)
0:00:00 DEB - argument: gzip_temp=false (gzip temporary feature storage (uses more CPU, but less disk space))
0:00:00 DEB - argument: mmap_temp=false (use memory-mapped IO for temp feature files)
0:00:00 DEB - argument: sort_max_readers=6 (maximum number of concurrent read threads to use when sorting chunks)
0:00:00 DEB - argument: sort_max_writers=6 (maximum number of concurrent write threads to use when sorting chunks)
0:00:00 DEB - argument: nodemap_type=sparsearray (type of node location map, one of [noop, sortedtable, sparsearray, array])
0:00:00 DEB - argument: nodemap_storage=mmap (storage for node location map, one of [ram, mmap, direct])
0:00:00 DEB - argument: nodemap_madvise=true (use linux madvise(random) for node locations)
0:00:00 DEB - argument: multipolygon_geometry_storage=mmap (storage for multipolygon geometries, one of [ram, mmap, direct])
0:00:00 DEB - argument: multipolygon_geometry_madvise=true (use linux madvise(random) for temporary multipolygon geometry storage)
0:00:00 DEB - argument: http_user_agent=Planetiler downloader (https://github.com/onthegomap/planetiler) (User-Agent header to set when downloading files over HTTP)
0:00:00 DEB - argument: http_timeout=30 seconds (Timeout to use when downloading files over HTTP)
0:00:00 DEB - argument: http_retries=1 (Retries to use when downloading files over HTTP)
0:00:00 DEB - argument: download_chunk_size_mb=100 (Size of file chunks to download in parallel in megabytes)
0:00:00 DEB - argument: download_threads=1 (Number of parallel threads to use when downloading each file)
0:00:00 DEB - argument: min_feature_size_at_max_zoom=0.0625 (Default value for the minimum size in tile pixels of features to emit at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: min_feature_size=1.0 (Default value for the minimum size in tile pixels of features to emit below the maximum zoom level)
0:00:00 DEB - argument: simplify_tolerance_at_max_zoom=0.0625 (Default value for the tile pixel tolerance to use when simplifying features at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: simplify_tolerance=0.1 (Default value for the tile pixel tolerance to use when simplifying features below the maximum zoom level)
0:00:00 DEB - argument: osm_lazy_reads=false (Read OSM blocks from disk in worker threads)
0:00:00 DEB - argument: compact_db=false (Reduce the DB size by separating and deduping the tile data)
0:00:00 DEB - argument: tmpdir=data/tmp (temp directory)
0:00:00 DEB - argument: only_download=false (download source data then exit)
0:00:00 DEB - argument: download=false (download sources)
0:00:00 DEB - argument: temp_nodes=data/tmp/node.db (temp node db location)
0:00:00 DEB - argument: temp_multipolygons=data/tmp/multipolygon.db (temp multipolygon db location)
0:00:00 DEB - argument: temp_features=data/tmp/feature.db (temp feature db location)
0:00:00 DEB - argument: only_fetch_wikidata=false (fetch wikidata translations then quit)
0:00:00 DEB - argument: fetch_wikidata=false (fetch wikidata translations then continue)
0:00:00 DEB - argument: use_wikidata=true (use wikidata translations)
0:00:00 DEB - argument: wikidata_cache=data/sources/wikidata_names.json (wikidata cache file)
0:00:00 DEB - argument: lake_centerlines_path=data/sources/lake_centerline.shp.zip (lake_centerlines shapefile path)
0:00:00 DEB - argument: free_lake_centerlines_after_read=false (delete lake_centerlines input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: water_polygons_path=data/sources/water-polygons-split-3857.zip (water_polygons shapefile path)
0:00:00 DEB - argument: free_water_polygons_after_read=false (delete water_polygons input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_path=data/sources/natural_earth_vector.sqlite.zip (natural_earth sqlite db path)
0:00:00 DEB - argument: free_natural_earth_after_read=false (delete natural_earth input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: osm_path=data/sources/rhode_island.osm.pbf (osm OSM input file path)
0:00:00 DEB - argument: free_osm_after_read=false (delete osm input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: mbtiles=data/out.mbtiles (mbtiles output file)
0:00:00 DEB - argument: transliterate=true (attempt to transliterate latin names)
0:00:00 DEB - argument: languages=am,ar,az,be,bg,br,bs,ca,co,cs,cy,da,de,el,en,eo,es,et,eu,fi,fr,fy,ga,gd,he,hi,hr,hu,hy,id,is,it,ja,ja_kana,ja_rm,ja-Latn,ja-Hira,ka,kk,kn,ko,ko-Latn,ku,la,lb,lt,lv,mk,mt,ml,nl,no,oc,pl,pt,rm,ro,ru,sk,sl,sq,sr,sr-Latn,sv,ta,te,th,tr,uk,zh (languages to use)
0:00:00 DEB - argument: only_layers= (Include only certain layers)
0:00:00 DEB - argument: exclude_layers= (Exclude certain layers)
0:00:00 DEB - argument: boundary_country_names=true (boundary layer: add left/right codes of neighboring countries)
0:00:00 DEB - argument: transportation_z13_paths=false (transportation(_name) layer: show all paths on z13)
0:00:00 DEB - argument: building_merge_z13=true (building layer: merge nearby buildings at z13)
0:00:00 DEB - argument: transportation_name_brunnel=false (transportation_name layer: set to false to omit brunnel and help merge long highways)
0:00:00 DEB - argument: transportation_name_size_for_shield=false (transportation_name layer: allow road names on shorter segments (ie. they will have a shield))
0:00:00 DEB - argument: transportation_name_limit_merge=false (transportation_name layer: limit merge so we don't combine different relations to help merge long highways)
0:00:00 DEB - argument: transportation_name_minor_refs=false (transportation_name layer: include name and refs from minor road networks if not present on a way)
0:00:00 DEB - argument: mbtiles_name=OpenMapTiles ('name' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org ('description' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a> ('attribution' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_version=3.13.1 ('version' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_type=baselayer ('type' attribute for mbtiles metadata)
0:00:00 DEB - argument: help=false (show arguments then exit)
0:00:00 INF - Building BasemapProfile profile into data/out.mbtiles in these phases:
0:00:00 INF -   lake_centerlines: Process features in data/sources/lake_centerline.shp.zip
0:00:00 INF -   water_polygons: Process features in data/sources/water-polygons-split-3857.zip
0:00:00 INF -   natural_earth: Process features in data/sources/natural_earth_vector.sqlite.zip
0:00:00 INF -   osm_pass1: Pre-process OpenStreetMap input (store node locations then relation members)
0:00:00 INF -   osm_pass2: Process OpenStreetMap nodes, ways, then relations
0:00:00 INF -   sort: Sort rendered features by tile ID
0:00:00 INF -   mbtiles: Encode each tile and write to data/out.mbtiles
0:00:01 INF - no wikidata translations found, run with --fetch-wikidata to download
0:00:01 DEB - ✓ 194M storage on / (/dev/root) requested for read phase disk, 30G available
0:00:01 DEB -  - 43M used for temporary node location cache
0:00:01 DEB -  - 6.6M used for temporary multipolygon geometry cache
0:00:01 DEB -  - 144M used for temporary feature storage
0:00:01 DEB - ✓ 216M storage on / (/dev/root) requested for write phase disk, 30G available
0:00:01 DEB -  - 144M used for temporary feature storage
0:00:01 DEB -  - 72M used for mbtiles output
0:00:01 DEB - ✓ 312M JVM heap requested for read phase, 4.2G available
0:00:01 DEB -  - 300M used for sparsearray node location in-memory index
0:00:01 DEB -  - 12M used for temporary profile storage
0:00:01 DEB - ✓ 50M storage on / (/dev/root) requested for read phase, 30G available
0:00:01 DEB -  - 43M used for sparsearray node location cache
0:00:01 DEB -  - 6.6M used for multipolygon way geometries
0:00:01 DEB - ✓ 50M temporary files and 2.9G of free memory for OS to cache them
0:00:01 INF - Using merge sort feature map, chunk size=1431mb max workers=2
0:00:02 INF - dataFileCache open start
0:00:03 INF [lake_centerlines] - 
0:00:03 INF [lake_centerlines] - Starting...
0:00:05 INF [lake_centerlines] -  read: [  59k 100%  28k/s ] write: [    0    0/s ] 0    
    cpus: 1.8 gc:  3% heap: 173M/4.2G direct: 237k postGC: 79M
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:05 INF [lake_centerlines] - Finished in 2s cpu:4s avg:1.8
0:00:05 INF [lake_centerlines] -   read     1x(85% 2s)
0:00:05 INF [lake_centerlines] -   process  2x(13% 0.3s wait:2s)
0:00:05 INF [lake_centerlines] -   write    1x(0% 0s wait:2s)
0:00:05 INF [water_polygons] - 
0:00:05 INF [water_polygons] - Starting...
0:00:15 INF [water_polygons] -  read: [ 2.2k  15%  220/s ] write: [  46k 4.5k/s ] 3.1M 
    cpus: 2 gc:  9% heap: 2.7G/4.2G direct: 52M postGC: 948M
    read(60%) ->    (1/1k) -> process(45% 11%) -> (1.2k/53k) -> write( 0%)
0:00:25 INF [water_polygons] -  read: [ 4.9k  34%  276/s ] write: [ 300k  25k/s ] 15M  
    cpus: 1.7 gc: 10% heap: 2.7G/4.2G direct: 52M postGC: 1.5G
    read(63%) ->    (0/1k) -> process(28% 25%) -> (251/53k) -> write( 1%)
0:00:31 INF [water_polygons] -  read: [  14k 100% 1.4k/s ] write: [ 4.3M 615k/s ] 186M 
    cpus: 1.5 gc:  7% heap: 1.9G/4.2G direct: 52M postGC: 1.7G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:31 INF [water_polygons] - Finished in 27s cpu:47s gc:2s avg:1.8
0:00:31 INF [water_polygons] -   read     1x(59% 16s wait:3s)
0:00:31 INF [water_polygons] -   process  2x(28% 7s wait:12s)
0:00:31 INF [water_polygons] -   write    1x(4% 1s wait:25s)
0:00:31 INF [natural_earth] - unzipping /home/runner/work/planetiler/planetiler/data/sources/natural_earth_vector.sqlite.zip to data/tmp/natearth.sqlite
0:00:40 INF [natural_earth] - 
0:00:40 INF [natural_earth] - Starting...
0:00:51 INF [natural_earth] -  read: [ 329k  94%  32k/s ] write: [    0    0/s ] 186M 
    cpus: 1.5 gc:  1% heap: 3G/4.2G direct: 52M postGC: 1.7G
    read(88%) ->    (0/1k) -> process(16% 18%) -> (132/53k) -> write( 0%)
0:00:51 INF [natural_earth] -  read: [ 349k 100%  25k/s ] write: [  181  231/s ] 186M 
    cpus: 1.9 gc:  0% heap: 3.2G/4.2G direct: 52M postGC: 1.7G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:51 INF [natural_earth] - Finished in 12s cpu:17s avg:1.4
0:00:51 INF [natural_earth] -   read     1x(80% 9s sys:1s done:1s)
0:00:51 INF [natural_earth] -   process  2x(16% 2s wait:10s done:1s)
0:00:51 INF [natural_earth] -   write    1x(0% 0s wait:11s done:1s)
0:00:51 INF [osm_pass1] - 
0:00:51 INF [osm_pass1] - Starting...
0:00:55 INF [osm_pass1:process] - Finished nodes: 4,590,530 (1.4M/s) in 3s cpu:6s avg:1.8
0:00:56 INF [osm_pass1:process] - Finished ways: 330,172 (397k/s) in 0.8s cpu:2s avg:2
0:00:56 INF [osm_pass1:process] - Finished relations: 7,697 (96k/s) in 0.1s cpu:0.1s avg:1.9
0:00:56 INF [osm_pass1] -  nodes: [ 4.5M   1M/s ] 414M  ways: [ 330k  78k/s ] rels: [ 7.6k 1.8k/s ] blocks: [  617  147/s ]
    cpus: 1.8 gc:  1% heap: 793M/4.2G direct: 52M postGC: 837M hppc: 879k
    read( -%) ->     (0/4) -> parse( -%) ->     (0/4) -> process( -%)
0:00:56 DEB [osm_pass1] - Processed 617 blocks:
0:00:56 DEB [osm_pass1] -   nodes: 4,590,530 (1.4M/s) in 3s cpu:6s avg:1.8
0:00:56 DEB [osm_pass1] -   ways: 330,172 (397k/s) in 0.8s cpu:2s avg:2
0:00:56 DEB [osm_pass1] -   relations: 7,697 (96k/s) in 0.1s cpu:0.1s avg:1.9
0:00:56 INF [osm_pass1] - Finished in 4s cpu:8s avg:1.8
0:00:56 INF [osm_pass1] -   read     1x(3% 0.1s wait:4s)
0:00:56 INF [osm_pass1] -   parse    1x(58% 2s wait:1s)
0:00:56 INF [osm_pass1] -   process  1x(53% 2s wait:1s)
0:00:56 INF [osm_pass2] - 
0:00:56 INF [osm_pass2] - Starting...
0:00:58 DEB [osm_pass2:process] - Sorting long long multimap...
0:00:58 INF [osm_pass2:process] - Finished nodes: 4,590,530 (1.8M/s) in 3s cpu:5s avg:2
0:00:58 DEB [osm_pass2:process] - Sorted long long multimap 0s cpu:0.1s avg:2.7
0:00:58 WAR [osm_pass2:process] - No GB polygon for inferring route network types
0:01:06 INF [osm_pass2] -  nodes: [ 4.5M 100% 458k/s ] 414M  ways: [ 123k  38%  12k/s ] rels: [    0   0%    0/s ] features: [ 4.8M  50k/s ] 223M  blocks: [  588  95%   58/s ]
    cpus: 2 gc:  0% heap: 3G/4.2G direct: 52M postGC: 849M relInfo: 825k mpGeoms: 574k 
    read( 0%) ->   (11/13) -> process(62% 64%) -> (1.2k/53k) -> write( 2%)
0:01:16 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 414M  ways: [ 311k  94%  18k/s ] rels: [    0   0%    0/s ] features: [ 5.2M  38k/s ] 251M  blocks: [  612  99%    2/s ]
    cpus: 2 gc:  1% heap: 2.1G/4.2G direct: 52M postGC: 860M relInfo: 825k mpGeoms: 19M  
    read( -%) ->    (3/13) -> process(88% 86%) -> (1.2k/53k) -> write( 1%)
0:01:16 INF [osm_pass2:process] - Finished ways: 330,172 (18k/s) in 18s cpu:36s avg:2
0:01:25 INF [osm_pass2:process] - Finished relations: 7,697 (902/s) in 9s cpu:17s avg:2
0:01:26 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 414M  ways: [ 330k 100% 1.8k/s ] rels: [   7k  91%  699/s ] features: [ 5.2M 7.9k/s ] 266M  blocks: [  617 100%   <1/s ]
    cpus: 2 gc:  1% heap: 1.7G/4.2G direct: 52M postGC: 851M relInfo: 825k mpGeoms: 19M  
    read( -%) ->    (0/13) -> process(80% 81%) -> (1.1k/53k) -> write( 0%)
0:01:27 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 414M  ways: [ 330k 100%    0/s ] rels: [ 7.6k 100%  399/s ] features: [ 5.2M   3k/s ] 269M  blocks: [  617 100%    0/s ]
    cpus: 2 gc:  0% heap: 2.9G/4.2G direct: 52M postGC: 851M relInfo: 825k mpGeoms: 19M  
    read( -%) ->    (0/13) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:27 DEB [osm_pass2] - Processed 617 blocks:
0:01:27 DEB [osm_pass2] -   nodes: 4,590,530 (1.8M/s) in 3s cpu:5s avg:2
0:01:27 DEB [osm_pass2] -   ways: 330,172 (18k/s) in 18s cpu:36s avg:2
0:01:27 DEB [osm_pass2] -   relations: 7,697 (902/s) in 9s cpu:17s avg:2
0:01:27 INF [osm_pass2] - Finished in 32s cpu:1m3s avg:2
0:01:27 INF [osm_pass2] -   read     1x(0% 0.1s wait:16s done:16s)
0:01:27 INF [osm_pass2] -   process  2x(77% 25s)
0:01:27 INF [osm_pass2] -   write    1x(1% 0.4s wait:31s)
0:01:27 INF [boundaries] - 
0:01:27 INF [boundaries] - Starting...
0:01:27 INF [boundaries] - Creating polygons for 1 boundaries
0:01:28 WAR [boundaries] - Unable to form closed polygon for OSM relation 148838 (likely missing edges)
0:01:28 INF [boundaries] - Finished creating 0 country polygons
0:01:28 INF [boundaries] - Finished in 0.1s cpu:0.1s avg:1.5
0:01:28 INF - Deleting node.db to make room for output file
0:01:28 INF [sort] - 
0:01:28 INF [sort] - Starting...
0:01:28 INF [sort] - Grouped 8 chunks into 1
0:01:32 INF [sort] -  chunks: [   1 /   1 100% ] 269M 
    cpus: 1.2 gc: 10% heap: 605M/4.2G direct: 52M postGC: 543M
    ->     (0/3) -> worker( -%)
0:01:32 INF [sort] - Finished in 5s cpu:6s avg:1.2
0:01:32 INF [sort] -   worker  1x(73% 4s)
0:01:32 INF - read:2s write:1s sort:1s
0:01:32 INF [mbtiles] - 
0:01:32 INF [mbtiles] - Starting...
0:01:32 DEB [mbtiles:write] - Execute mbtiles: create table metadata (name text, value text);
0:01:32 DEB [mbtiles:write] - Execute mbtiles: create unique index name on metadata (name);
0:01:32 DEB [mbtiles:write] - Execute mbtiles: create table tiles (zoom_level integer, tile_column integer, tile_row, tile_data blob);
0:01:32 DEB [mbtiles:write] - Execute mbtiles: create unique index tile_index on tiles (zoom_level, tile_column, tile_row)
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: name=OpenMapTiles
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: format=pbf
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a>
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: version=3.13.1
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: type=baselayer
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: bounds=-74.07,21.34,-17.84,43.55
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: center=-45.955,32.445,3
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: minzoom=0
0:01:32 DEB [mbtiles:write] - Set mbtiles metadata: maxzoom=14
0:01:33 DEB [mbtiles:write] - Set mbtiles metadata: json={"vector_layers":[{"id":"aerodrome_label","fields":{"name_int":"String","iata":"String","ele_ft":"Number","name_de":"String","name":"String","icao":"String","name:en":"String","class":"String","ele":"Number","name_en":"String","name:latin":"String"},"minzoom":10,"maxzoom":14},{"id":"aeroway","fields":{"ref":"String","class":"String"},"minzoom":10,"maxzoom":14},{"id":"boundary","fields":{"disputed":"Number","admin_level":"Number","maritime":"Number","disputed_name":"String"},"minzoom":0,"maxzoom":14},{"id":"building","fields":{"colour":"String","render_height":"Number","render_min_height":"Number"},"minzoom":13,"maxzoom":14},{"id":"housenumber","fields":{"housenumber":"String"},"minzoom":14,"maxzoom":14},{"id":"landcover","fields":{"subclass":"String","class":"String","_numpoints":"Number"},"minzoom":7,"maxzoom":14},{"id":"landuse","fields":{"class":"String"},"minzoom":4,"maxzoom":14},{"id":"mountain_peak","fields":{"name_int":"String","customary_ft":"Number","ele_ft":"Number","name_de":"String","name":"String","rank":"Number","class":"String","name_en":"String","name:latin":"String","ele":"Number"},"minzoom":7,"maxzoom":14},{"id":"park","fields":{"name_int":"String","name_de":"String","name":"String","name:en":"String","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"place","fields":{"name:fy":"String","name_int":"String","capital":"Number","name:uk":"String","name:pl":"String","name:nl":"String","name:be":"String","name:ru":"String","name:ko":"String","name_de":"String","name":"String","rank":"Number","name:en":"String","name:eo":"String","class":"String","name:hu":"String","name:ta":"String","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":2,"maxzoom":14},{"id":"poi","fields":{"name_int":"String","level":"Number","name_de":"String","name":"String","subclass":"String","indoor":"Number","name:en":"String","class":"String","layer":"Number","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":12,"maxzoom":14},{"id":"transportation","fields":{"brunnel":"String","access":"String","expressway":"Number","bicycle":"String","surface":"String","level":"Number","ramp":"Number","mtb_scale":"String","toll":"Number","layer":"Number","oneway":"Number","network":"String","horse":"String","service":"String","subclass":"String","class":"String","foot":"String"},"minzoom":4,"maxzoom":14},{"id":"transportation_name","fields":{"name_int":"String","route_4":"String","route_3":"String","route_2":"String","route_1":"String","layer":"Number","network":"String","ref":"String","name_de":"String","name":"String","subclass":"String","ref_length":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"water","fields":{"intermittent":"Number","id":"Number","class":"String"},"minzoom":0,"maxzoom":14},{"id":"water_name","fields":{"name_int":"String","name_de":"String","name":"String","intermittent":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":9,"maxzoom":14},{"id":"waterway","fields":{"name_int":"String","brunnel":"String","name_de":"String","_relid":"Number","intermittent":"Number","name":"String","class":"String","name:latin":"String","name_en":"String"},"minzoom":4,"maxzoom":14}]}
0:01:33 INF [mbtiles:write] - Starting z0
0:01:33 INF [mbtiles:write] - Finished z0 in 0s cpu:0s avg:0, now starting z1
0:01:33 INF [mbtiles:write] - Finished z1 in 0s cpu:0s avg:0, now starting z2
0:01:33 INF [mbtiles:write] - Finished z2 in 0s cpu:0s avg:0, now starting z3
0:01:33 INF [mbtiles:write] - Finished z3 in 0s cpu:0s avg:0, now starting z4
0:01:33 INF [mbtiles:write] - Finished z4 in 0s cpu:0s avg:72.8, now starting z5
0:01:33 INF [mbtiles:write] - Finished z5 in 0s cpu:0s avg:0, now starting z6
0:01:33 INF [mbtiles:write] - Finished z6 in 0s cpu:0s avg:0, now starting z7
0:01:34 INF [mbtiles:write] - Finished z7 in 0.8s cpu:2s avg:2, now starting z8
0:01:34 INF [mbtiles:write] - Finished z8 in 0.6s cpu:1s avg:2, now starting z9
0:01:36 INF [mbtiles:write] - Finished z9 in 2s cpu:3s avg:2, now starting z10
0:01:36 INF [mbtiles:write] - Finished z10 in 0.1s cpu:0.1s avg:1.9, now starting z11
0:01:37 INF [mbtiles:write] - Finished z11 in 0.8s cpu:2s avg:2, now starting z12
0:01:39 INF [mbtiles:write] - Finished z12 in 2s cpu:4s avg:2, now starting z13
0:01:42 INF [mbtiles] -  features: [ 634k  12%  63k/s ] 269M  tiles: [ 290k  29k/s ] 45M  
    cpus: 2 gc:  4% heap: 1.8G/4.2G direct: 52M postGC: 846M
    read( 4%) -> (212/217) -> encode(57% 55%) -> (215/216) -> write( 8%)
    last tile: 13/2468/3042 (z13 4%) https://www.openstreetmap.org/#map=13/41.96766/-71.54297
0:01:52 INF [mbtiles:write] - Finished z13 in 13s cpu:26s avg:2, now starting z14
0:01:52 INF [mbtiles] -  features: [ 1.8M  34% 118k/s ] 269M  tiles: [   1M  79k/s ] 142M 
    cpus: 2 gc:  2% heap: 1.1G/4.2G direct: 52M postGC: 858M
    read( 5%) -> (152/217) -> encode(60% 54%) -> (215/216) -> write(20%)
    last tile: 14/4875/7195 (z14 2%) https://www.openstreetmap.org/#map=14/21.39170/-72.88330
0:02:01 INF [mbtiles:write] - Finished z14 in 9s cpu:13s avg:1.5
0:02:01 INF [mbtiles] -  features: [ 5.2M 100% 399k/s ] 269M  tiles: [ 4.1M 349k/s ] 515M 
    cpus: 1.5 gc:  1% heap: 1.6G/4.2G direct: 52M postGC: 833M
    read( -%) ->   (0/217) -> encode( -%  -%) ->   (0/216) -> write( -%)
    last tile: 14/7380/5985 (z14 100%) https://www.openstreetmap.org/#map=14/43.56447/-17.84180
0:02:01 DEB [mbtiles] - Tile stats:
0:02:01 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:02:01 DEB [mbtiles] - z1 avg:4k max:4k
0:02:01 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:02:01 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:02:01 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:02:01 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:02:01 DEB [mbtiles] - z6 avg:1.4k max:24k
0:02:01 DEB [mbtiles] - z7 avg:1k max:56k
0:02:01 DEB [mbtiles] - z8 avg:476 max:113k
0:02:01 DEB [mbtiles] - z9 avg:298 max:279k
0:02:01 DEB [mbtiles] - z10 avg:165 max:233k
0:02:01 DEB [mbtiles] - z11 avg:108 max:132k
0:02:01 DEB [mbtiles] - z12 avg:86 max:119k
0:02:01 DEB [mbtiles] - z13 avg:72 max:109k
0:02:01 DEB [mbtiles] - z14 avg:68 max:257k
0:02:01 DEB [mbtiles] - all avg:71 max:0
0:02:01 DEB [mbtiles] -  # features: 5,289,610
0:02:01 DEB [mbtiles] -     # tiles: 4,115,450
0:02:01 INF [mbtiles] - Finished in 29s cpu:52s avg:1.8
0:02:01 INF [mbtiles] -   read    1x(7% 2s wait:24s)
0:02:01 INF [mbtiles] -   encode  2x(47% 13s wait:7s)
0:02:01 INF [mbtiles] -   write   1x(36% 10s sys:1s wait:16s)
0:02:01 INF - Finished in 2m2s cpu:3m28s gc:4s avg:1.7
0:02:01 INF - FINISHED!
0:02:01 INF - 
0:02:01 INF - ----------------------------------------
0:02:01 INF - 	overall          2m2s cpu:3m28s gc:4s avg:1.7
0:02:01 INF - 	lake_centerlines 2s cpu:4s avg:1.8
0:02:01 INF - 	  read     1x(85% 2s)
0:02:01 INF - 	  process  2x(13% 0.3s wait:2s)
0:02:01 INF - 	  write    1x(0% 0s wait:2s)
0:02:01 INF - 	water_polygons   27s cpu:47s gc:2s avg:1.8
0:02:01 INF - 	  read     1x(59% 16s wait:3s)
0:02:01 INF - 	  process  2x(28% 7s wait:12s)
0:02:01 INF - 	  write    1x(4% 1s wait:25s)
0:02:01 INF - 	natural_earth    12s cpu:17s avg:1.4
0:02:01 INF - 	  read     1x(80% 9s sys:1s done:1s)
0:02:01 INF - 	  process  2x(16% 2s wait:10s done:1s)
0:02:01 INF - 	  write    1x(0% 0s wait:11s done:1s)
0:02:01 INF - 	osm_pass1        4s cpu:8s avg:1.8
0:02:01 INF - 	  read     1x(3% 0.1s wait:4s)
0:02:01 INF - 	  parse    1x(58% 2s wait:1s)
0:02:01 INF - 	  process  1x(53% 2s wait:1s)
0:02:01 INF - 	osm_pass2        32s cpu:1m3s avg:2
0:02:01 INF - 	  read     1x(0% 0.1s wait:16s done:16s)
0:02:01 INF - 	  process  2x(77% 25s)
0:02:01 INF - 	  write    1x(1% 0.4s wait:31s)
0:02:01 INF - 	boundaries       0.1s cpu:0.1s avg:1.5
0:02:01 INF - 	sort             5s cpu:6s avg:1.2
0:02:01 INF - 	  worker  1x(73% 4s)
0:02:01 INF - 	mbtiles          29s cpu:52s avg:1.8
0:02:01 INF - 	  read    1x(7% 2s wait:24s)
0:02:01 INF - 	  encode  2x(47% 13s wait:7s)
0:02:01 INF - 	  write   1x(36% 10s sys:1s wait:16s)
0:02:01 INF - ----------------------------------------
0:02:01 INF - 	features	269MB
0:02:01 INF - 	mbtiles	515MB
-rw-r--r-- 1 runner docker 55M May 24 17:34 run.jar

@shermp
Copy link

shermp commented May 9, 2022

Would using a faster non-cryptographic hashing algorithm improve performance much? Or is performance so limited by IO that it doesn't matter?

which splits the tiles table into tiles_shallow and tiles_data

tiles_shallow contains the coordinates plus a reference on the data ID
tiles_data contains the data ID plus the actual tile data

this allows to deduplicate content since multiple tiles can
reference the same data

in this mode, tiles is realized as a view that joins the two tables
tiles_shallow and tiles_data
@shermp
Copy link

shermp commented May 10, 2022

I got around to testing this using australia-oceania as the region.

Here are the file size differences:

-rw-rw-r--  1 xxxx xxxx 4.9G May 10 14:56 compact.mbtiles
-rw-rw-r--  1 xxxx xxxx 2.3G May 10 14:56 compact.mbtiles.zst
-rw-rw-r--  1 xxxx xxxx  12G May 10 15:16 full.mbtiles
-rw-rw-r--  1 xxxx xxxx 2.4G May 10 15:16 full.mbtiles.zst

When uncompressed, compacted mbtiles are a LOT smaller. There isn't much difference when compressing the files with zstandard, which is probably to be expected.

I tried serving the tiles with tileserver-gl, and while I have no performance figures, the compacted tiles do feel slightly slower than the full tileset. But by the time you add some tile caching into the mix, it may not matter much.

@msbarry
Copy link
Contributor

msbarry commented May 10, 2022

Thanks for this change @bbilger! This looks like a huge improvement

Would using a faster non-cryptographic hashing algorithm improve performance much? Or is performance so limited by IO that it doesn't matter?

Have you tried sampling with visualvm to see how much of a hot spot the hash is?

I'm also looking into generating pmtiles output, and it uses the FNV hash to solve this problem. I haven't checked for collisions but with a 64 bit hash over 200-300 million tiles it should theoretically be a 1 or 2 out of 1000 chance of a collision, but it is very simple and fast:

  private static final long FNV1_64_INIT = 0xcbf29ce484222325L;
  private static final long FNV1_PRIME_64 = 1099511628211L;
  public static long hash(byte[] data) {
    long hash = FNV1_64_INIT;
    for (byte datum : data) {
      hash ^= (datum & 0xff);
      hash *= FNV1_PRIME_64;
    }
    return hash;
  }

Maybe @bdon has a better sense of fnv hash's suitability for planet-scale tilesets?

@shermp
Copy link

shermp commented May 10, 2022

I guess at the scales we're talking about (millions of tiles), one would really want the chance of collisions to be so low, that you don't have to worry about checking for collisions, otherwise I think you'd have to do a binary compare to know for certain. The last thing we would want for a data tile to collide with an ocean tile and not get included...

I think you end up going with a faster hash algorithm, a 128 bit hash would probably be the safer option, and I imagine it should still be faster than sha1/md5.

@msbarry
Copy link
Contributor

msbarry commented May 10, 2022

Spoke offline with Brandon about hash functions - he raises a good point that it's likely to be only handful of tiles that are duplicated, and they are most likely small (oceans, parks, etc.). We could apply a cheap pre-filter before deciding to store the tile hash to contents in an in-memory map. Some possibilities:

  1. Have some configurable threshold for deduping - i.e. if the tile is under 20kb then compute its hash, and attempt to deduplicate with the in-memory map, otherwise just always write it out as a new value
  2. Compute a hash of the tile over a small hash space where collisions are OK (say 1000 or 100000 values), and use a bitset to track whether we've seen a tile with this small hash before. If we have, then attempt to deduplicate with the in-memory map, otherwise write it out as a new value but mark the bitset so the next occurrence will start to deduplicate

Either of these approaches would drastically reduce the number of distinct tiles, and likely make something like the 32-bit integer FNV hash sufficient to avoid any possibility of a collision - but in any case a cryptographic hash function is likely overkill.

@bbilger
Copy link
Contributor Author

bbilger commented May 10, 2022

@shermp
I wrote BenchmarkMbtilesRead to compare reading from equal mbtiles files (e.g. compact vs. non-compact) and to get some idea about the performance decrease. On my machine, reading from australia-compact is 10-15% slower than from australia-not-compact.

@msbarry / @shermp
What I did so far was swapping SHA-1 by Murmur3 128 - and it had zero impact on the performance which made me believe this is not the bottleneck. But all I did was checking the stats produced by planetiler at the end, and haven't analyzed it any further but will try to do so now.

@msbarry
re: 1.: The threshold sounds like a good idea. Do you want it to be configurable as CLI argument?
re 2.: Do I understand the idea correctly that you effectively want to hash twice - in case of a duplicate? First layer would hash into a small space and store the result to a BitSet, and the second layer would use a "real" hash function (e.g. FNV 32) and store the result in a HashMap (hash, dataId). Is the idea here (in case of no duplicate) that the first hashing would be even cheaper, and looking up from a BitSet is a bit faster than from a HashMap? Sorry, just want to double-check that I get the idea, since it adds some complexity and FNV-32 already looks pretty fast.

...and add 2 layers of hashing: one based on Java-hash + mod
and the other based on FNV1-32 (up for discussion).

The result of the first layer is just stored in a BitSet.
This only allows us to tell if there's a dupe.
For the first layer collisions are expected.
The result of the seconds layer is a map between the hash and the data ID.
The first layer, however, allows to delay the need to create a "real" hash.

In addition some "filtering" was added to only perform all this hashing,
if the tile's size is below a certain threshold because dupes onyl occur
on simple/small tiles like ocean tiles.

As part of this, the actual encoding was extracted, as well.

Note: The improvement is barely noticable
but it's probably the right thing to do.
@bbilger
Copy link
Contributor Author

bbilger commented May 10, 2022

Thanks for your feedback, I tried to address it in 0faa63a => please let me know what you think.

Testing with Australia, I can barely notice any difference but it seems a tiny bit faster. As I said before, I don't think the hashing is the bottleneck, and at least I cannot identify it as one in VisualVM but it's probably the right thing to do to not use SHA-1.

One reason I decided to use SHA-1 in the beginning was that I wanted to avoid any collisions. As such I share @shermp 's concerns that using FNV 32 might be a bit problematic since a collision would be rather bad. Maybe it would be better to use FNV 64, FNV 128, or Murmur3-128?

@msbarry
Copy link
Contributor

msbarry commented May 11, 2022

On second thought, I looked at some stats... there are 280 million tiles for the planet, but only about 50 million unique tiles so only about ~7/100k chance of collision. And the size threshold might not be too helpful. Looking at Australia, about 85% of unique tiles are <1kb (gzipped, but still a good indicator).

re 2.: Do I understand the idea correctly that you effectively want to hash twice - in case of a duplicate? First layer would hash into a small space and store the result to a BitSet, and the second layer would use a "real" hash function (e.g. FNV 32) and store the result in a HashMap (hash, dataId). Is the idea here (in case of no duplicate) that the first hashing would be even cheaper, and looking up from a BitSet is a bit faster than from a HashMap? Sorry, just want to double-check that I get the idea, since it adds some complexity and FNV-32 already looks pretty fast.

For this approach we'd want a first pass that checks if we might have seen a tile before, and if so then consult the map from tile hash to tile ID. A bloom filter would work for this first pass, or a bitset where number of bits is > expected # of unique tiles.

Looking at that ~7/100k odds of a collision, it might not even be worth the extra check.

@bdon
Copy link
Contributor

bdon commented May 11, 2022

Instead of using a size threshold, could we limit the deduplication only to tiles that have a single layer, or a single feature on a single layer? We know a priori that the deduplication is useful for ocean squares and land squares, which might have arbitrary tags that increase the total tile size. We would need the ability to introspect on the PBF content instead of treating it as a blob. Are there good counterexamples on why this won't work?

@msbarry
Copy link
Contributor

msbarry commented May 11, 2022

It is possible that we can pass through an "is fill" bit for each feature, then inspect easily if all features in a tile are fill polygons without deserializing - that's what I was planning for #168.

So maybe to start on this PR, we could limit the deduping to just tiles with less than some small number of features (1? 2? 5?) and do the more robust fill identification as part of #168 ? There's a getNumFeaturesToEmit method to get the number of features in TileFeatures argument to the tile encoder without deserializing anything.

also get rid of the 2 layer hashing (1 simplehash+bitset, 2 fnv+map), again
@bbilger
Copy link
Contributor Author

bbilger commented May 11, 2022

"NumFeaturesToEmit" seems like the right thing to do and I implemented it accordingly.
Duplicates in Australia where spread as follows per "NumFeaturesToEmit":
{1=13074826, 2=1359105, 3=397821, 4=13822, 5=73, 7=4, 8=2, 9=2, 18=2}
=> trying to de-duplicate/hash only if less than 5 "NumFeaturesToEmit"

...
gonna try to tackle packing tiles_shallow's x,y,z into one int in the DB now next - experimenting with using generated columns

@shermp
Copy link

shermp commented May 11, 2022

Ran some numbers today, looks like it's possible to encode tiles up to zoom level 29 in a 64 bit int.

Considering that at each zoom level, each axis has 2zoom tiles, you can encode tiles such:

Bits 0-4 could be zoom
Bits 5-33 could be y
bits 34-62 could be x

Or any other similar scheme.

@bbilger
Copy link
Contributor Author

bbilger commented May 11, 2022

👍
Someone please correct me if I am wrong but I think the max supported zoom level by planetiler is 14. This allows to pack it into a 32bit integer.
For "packing" x,y,z into an int, I am using TileCoord#encode and on the generated columns, which I am currently experimenting with, I am basically using TileCoord#decode

@shermp
Copy link

shermp commented May 11, 2022

Are you generating the x,y columns as part of the SQL view?

@bbilger
Copy link
Contributor Author

bbilger commented May 11, 2022

Well, yes but mostly no. Technically the view remains unchanged to what it is now for the compact mode:

CREATE VIEW tiles AS
select
  tiles_shallow.zoom_level as zoom_level,
  tiles_shallow.tile_column as tile_column,
  tiles_shallow.tile_row as tile_row,
  tiles_data.tile_data as tile_data
from tiles_shallow
join tiles_data on tiles_shallow.tile_data_id = tiles_data.tile_data_id

The key change is to have generated and most importantly indexed columns in tiles_shallow:

CREATE TABLE tiles_shallow (
  tile_id integer, -- no primary key was added on purpose since we don't use it and it would hurt write performance
  tile_data_id integer,

  zoom_level integer as ( (tile_id >> 28) + 8 ),
  tile_column integer as ( (tile_id >> 14) & ((1 << 14) - 1) ),
  tile_row integer as ( (1 << zoom_level) - 1 - (((1 << zoom_level) - 1) - ((tile_id) & ((1 << 14) - 1))) )
)
CREATE UNIQUE INDEX tiles_shallow_index on tiles_shallow (zoom_level, tile_column, tile_row)

With a view-only approach like the following we would end up with a full table scan which would hurt read performance badly

CREATE VIEW tiles AS
select
  ( (tiles_shallow.tile_id >> 28) + 8 ) as zoom_level,
  ( (tiles_shallow.tile_id >> 14) & ((1 << 14) - 1) ) as tile_column,
   ( (1 << ( (tiles_shallow.tile_id >> 28) + 8 )) - 1 - (((1 << ( (tiles_shallow.tile_id >> 28) + 8 )) - 1) - ((tile_id) & ((1 << 14) - 1))) ) as tile_row,
  tiles_data.tile_data as tile_data
from tiles_shallow
join tiles_data on tiles_shallow.tile_data_id = tiles_data.tile_data_id

What I can say so far is that:

  • write performance is mostly unaffected - maybe a slight improvement
  • file size is slightly lower

haven't tested impact on read yet

@msbarry
Copy link
Contributor

msbarry commented May 12, 2022

Someone please correct me if I am wrong but I think the max supported zoom level by planetiler is 14. This allows to pack it into a 32bit integer.

For now yes although it will increase to 15 or 16 at some point in the future.

write performance is mostly unaffected - maybe a slight improvement

👍 I would expect some write performance increase to come from a simpler index structure, but most to come from being able to fit more tiles into each batch with the 999 limit on prepared statement params

Curious how read performance turns out. To query from application layer, it would be relatively efficient to map from x/y/z to tile ID, then lookup the data for it but I'm not sure how possible it is to express that in a sqlite view...

Copy link
Contributor

@msbarry msbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the high-level benchmark and verification code. Planning to get down into the implementation details next...

@bbilger
Copy link
Contributor Author

bbilger commented May 12, 2022

Someone please correct me if I am wrong but I think the max supported zoom level by planetiler is 14. This allows to pack it into a 32bit integer.

For now yes although it will increase to 15 or 16 at some point in the future.

write performance is mostly unaffected - maybe a slight improvement

+1 I would expect some write performance increase to come from a simpler index structure, but most to come from being able to fit more tiles into each batch with the 999 limit on prepared statement params

Curious how read performance turns out. To query from application layer, it would be relatively efficient to map from x/y/z to tile ID, then lookup the data for it but I'm not sure how possible it is to express that in a sqlite view...

I cannot see a noticeable impact on read performance (BenchmarkMbtilesRead shows something like 1% faster).

The approach - let's call it approach no.1 - presented here #219 (comment) relies on generated x,y,z columns that need to be indexed. Thus the index structure is/should be (mostly) identical.

Unless you want to build a sqlite file that breaks with mbtiles spec, approach no.1 is the only feasible option I can see.
A full table-scan - let's call it approach no.2 - as described at the end of #219 (comment) is obviously a no-go.

This bring us to approach no.3:

create view tiles as
  select zoom_level, tile_column, tile_row, tile_data
  from tiles_shallow ts
  join tiles_data on ts.tile_data_id = tiles_data.tile_data_id
  left join zoom_levels 
  left join tile_columns 
  left join tile_rows
  where ts.tile_id = ((zoom_level-8) << 28) | (tile_column << 14) | (tile_row);

(bear with me if the formula is not quite correct, haven't tested it much but it worked at least for my test coordinates)
zoom_levels, tile_columns, tile_rows are views on a number table (create table numbers (number integer primary key) and it contains numbers from 0 to 1 << 14). The number table is a trick to get hold of the requested parameters zoom_level, tile_column, tile_row to reconstruct the tile_id.
I just created it manually but while it's relatively quick (~5ms), it's definitely slower, and it breaks pretty quickly if you don't filter it completely with zoom_level, tile_column, tile_row.

@msbarry
Copy link
Contributor

msbarry commented May 13, 2022

Someone please correct me if I am wrong but I think the max supported zoom level by planetiler is 14. This allows to pack it into a 32bit integer.

For now yes although it will increase to 15 or 16 at some point in the future.

write performance is mostly unaffected - maybe a slight improvement

+1 I would expect some write performance increase to come from a simpler index structure, but most to come from being able to fit more tiles into each batch with the 999 limit on prepared statement params
Curious how read performance turns out. To query from application layer, it would be relatively efficient to map from x/y/z to tile ID, then lookup the data for it but I'm not sure how possible it is to express that in a sqlite view...

I cannot see a noticeable impact on read performance (BenchmarkMbtilesRead shows something like 1% faster).

Cool, I would expect the fastest one to be the original, so mostly interested how much read qps drops with the more compact formats. If we can get a compact format with same read performance as original that would be ideal.

The approach - let's call it approach no.1 - presented here #219 (comment) relies on generated x,y,z columns that need to be indexed. Thus the index structure is/should be (mostly) identical.

OK, is there any benefit of this one then? It seems the same as x/y/z columns, but with some added complexity?

Unless you want to build a sqlite file that breaks with mbtiles spec, approach no.1 is the only feasible option I can see. A full table-scan - let's call it approach no.2 - as described at the end of #219 (comment) is obviously a no-go.

This bring us to approach no.3:

create view tiles as
  select zoom_level, tile_column, tile_row, tile_data
  from tiles_shallow ts
  join tiles_data on ts.tile_data_id = tiles_data.tile_data_id
  left join zoom_levels 
  left join tile_columns 
  left join tile_rows
  where ts.tile_id = ((zoom_level-8) << 28) | (tile_column << 14) | (tile_row);

(bear with me if the formula is not quite correct, haven't tested it much but it worked at least for my test coordinates) zoom_levels, tile_columns, tile_rows are views on a number table (create table numbers (number integer primary key) and it contains numbers from 0 to 1 << 14). The number table is a trick to get hold of the requested parameters zoom_level, tile_column, tile_row to reconstruct the tile_id. I just created it manually but while it's relatively quick (~5ms), it's definitely slower, and it breaks pretty quickly if you don't filter it completely with zoom_level, tile_column, tile_row.

OK cool thanks for exploring this, but if it's looking like there might not a good way to express this through sqlite views then feel free to ignore!

@msbarry
Copy link
Contributor

msbarry commented May 16, 2022

Also @bbilger let me know if any of those need more clarification, or if you'd rather have me push the change for one to your branch!

@bbilger
Copy link
Contributor Author

bbilger commented May 16, 2022

@msbarry all is clear. Sorry, I just didn't have time to work on it over the weekend. I will address the majority of your comments absolutely latest by Friday.

@bbilger
Copy link
Contributor Author

bbilger commented May 16, 2022

OK, is there any benefit of this one then? It seems the same as x/y/z columns, but with some added complexity?

The advantage is that write performance increases since the prepared statement will have just 1 vs 3 arguments. Let me re-write the write benchmark then a decision can be based on that.

the feature count might also differ => don't fail but just report as diff
see discussion 214
- mbtiles#encode: generate hash (if small) + "detect" memoized
- mbtiles#write: generate dataId + de-dupe (if hash + memoized*)

* not sure if we should de-dupe memoized, only since we could
  further reduce the file size by de-duping all with a hash
- bench mbtiles#write, only
- report tile writes / s
- try to make test somehwat realistic
@bbilger
Copy link
Contributor Author

bbilger commented May 18, 2022

@msbarry
So the benchmarks proved me wrong - not even the writes would improve with approach no.1 (compact tileId). Please find the results below.
I pushed the change for reference but already removed it again since there are no benefits.

Other than that I think I addressed all your feedback but please let me know if you want anything changed in addition or if there's anything else. The last "open" thing is whether to try to de-dupe memoized only or all with a hash.

Note: for some reason (in quite some cases) reading from the compact db seems to be faster now. I cannot really explain it, cannot find a bug in the read benchmark, either, tho.

Results

gen Australia in not-compact mode

0m0:06:29 INF - 	mbtiles          1m15s cpu:10m34s gc:5s avg:8.5
0m0:06:29 INF - 	  read    1x(20% 15s sys:2s wait:52s done:3s)
0m0:06:29 INF - 	  encode 12x(57% 42s wait:17s done:3s)
0m0:06:29 INF - 	  write   1x(57% 43s sys:6s wait:23s)
0m0:06:29 INF - ----------------------------------------
0m0:06:29 INF - 	features	3.6GB
0m0:06:29 INF - 	mbtiles	3.1GB

gen Australia in compact mode

0m0:06:15 INF - 	mbtiles          1m8s cpu:10m21s gc:4s avg:9.1
0m0:06:15 INF - 	  read    1x(22% 15s sys:2s wait:46s done:3s)
0m0:06:15 INF - 	  encode 12x(61% 42s wait:13s done:3s)
0m0:06:15 INF - 	  write   1x(54% 37s sys:4s wait:23s)
0m0:06:15 INF - ----------------------------------------
0m0:06:15 INF - 	features	3.6GB
0m0:06:15 INF - 	mbtiles	1.7GB

gen Australia in compact-tileId mode

0m0:06:16 INF - 	mbtiles          1m7s cpu:10m16s gc:4s avg:9.3
0m0:06:16 INF - 	  read    1x(21% 14s sys:1s wait:46s done:3s)
0m0:06:16 INF - 	  encode 12x(63% 42s wait:13s done:3s)
0m0:06:16 INF - 	  write   1x(55% 37s sys:3s wait:22s)
0m0:06:16 INF - ----------------------------------------
0m0:06:16 INF - 	features	3.6GB
0m0:06:16 INF - 	mbtiles	1.6GB

BenchmarkMbtilesRead

0m0:00:21 INF - working on /.../australia-not-compact.mbtiles
0m0:01:25 INF - readOperationsPerSecondStats: DoubleSummaryStatistics{count=10, sum=1097363.050225, min=18894.329609, average=109736.305022, max=124308.079321}
0m0:01:25 INF - working on /.../australia-compact.mbtiles
0m0:02:19 INF - readOperationsPerSecondStats: DoubleSummaryStatistics{count=10, sum=1178238.933600, min=28171.143437, average=117823.893360, max=128992.827038}
0m0:02:19 INF - working on /.../australia-compact-tileid.mbtiles
0m0:03:11 INF - readOperationsPerSecondStats: DoubleSummaryStatistics{count=10, sum=1180756.589802, min=29882.028874, average=118075.658980, max=131528.114417}
0m0:03:11 INF - diffs
0m0:03:11 INF - "/.../australia-not-compact.mbtiles" vs "/.../australia-compact.mbtiles": avg read operations per second improved by 7.370020647069566%
0m0:03:11 INF - "/.../australia-not-compact.mbtiles" vs "/.../australia-compact-tileid.mbtiles": avg read operations per second improved by 7.599448474226222%
0m0:03:11 INF - "/.../australia-compact.mbtiles" vs "/.../australia-compact-tileid.mbtiles": avg read operations per second improved by 0.21367959675708903%

BenchmarkMbtilesWrite

--bench-tiles-to-write=1000000 --bench-repetitions=10 --bench-no-dupe-tiles=10 --bench-distinct-tile-data-size=800 --bench-dupe-tile-data-size=100

  --bench-dupe-spread=10

    (non-compact)
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=5535826.018941, min=494772.881352, average=553582.601894, max=572869.469085}

    --compact-db
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=5893505.249478, min=516823.356896, average=589350.524948, max=614381.936503}

    --compact-db (tileId)
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=5629928.311951, min=515534.891533, average=562992.831195, max=584166.313215}

  --bench-dupe-spread=1

    (non-compact)
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=5515072.624620, min=500652.172801, average=551507.262462, max=572924.290469}

    --compact-db
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=6341896.356866, min=553440.630765, average=634189.635687, max=652745.072077}

    --compact-db (tileId)
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=5963269.114919, min=548489.425102, average=596326.911492, max=617753.702735}

  --bench-dupe-spread=50

    (non-compact)
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=5517730.613009, min=493311.037536, average=551773.061301, max=578336.895536}

    --compact-db
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=4996218.743876, min=434994.065552, average=499621.874388, max=517147.994776}

    --compact-db (tileId)
    tileWritesPerSecondsStats: DoubleSummaryStatistics{count=10, sum=4676350.304822, min=435352.136740, average=467635.030482, max=489182.376726}

Copy link
Contributor

@msbarry msbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, this looks great! Leaving a few minor comments, but I see no major issues remaining before we merge. Thanks for putting this together and improving the code long the way!

@ParameterizedTest
@ArgumentsSource(TestArgs.class)
void testFnv32(boolean expectSame, byte[] data0, byte[] data1) {
var hash0 = Hashing.fnv32(data0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion (feel free to ignore) - if you change fnv32 to take a vararg byte... data argument then you could make these tests a bit more compact, something like:

assertEquals(fnv32(1), fnv32(1));

and

assertNotEquals(fnv32(1, 2), fnv32(2, 1));

Copy link
Contributor Author

@bbilger bbilger May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I am a big fan of parameterized tests and the argument provider could have been re-used for other hash function but okay I rewrote it now - tho it's not so compact because of the required cast.

getDiffJoined(features1, features0, "\n"));

if (failOnFeatureDiff) {
throw new RuntimeException(msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix sonar's complaint here and below, this could be a more specific exception like assertion error?

@bbilger
Copy link
Contributor Author

bbilger commented May 23, 2022

Sorry for the delay, this looks great! Leaving a few minor comments, but I see no major issues remaining before we merge. Thanks for putting this together and improving the code long the way!

Absolutely no problem. Thank you for the review and the feedback. I tried to incorporate your feedback in this commit c65c7b7 . Let me know if there's anything else.

@msbarry
Copy link
Contributor

msbarry commented May 24, 2022

Ran some tests this morning on the planet with this branch merged with #225. --compact-db appears to improve write throughput by 5%, hurt read throughput by 5%, and make the mbtiles file about 25% smaller.

--compact-db=false:

0:38:41 INF - 	mbtiles          14m13s cpu:5h45m29s gc:45s avg:24.3
0:38:41 INF - 	  read    2x(24% 3m22s sys:58s wait:8m20s done:28s)
0:38:41 INF - 	  merge   1x(35% 5m3s sys:13s wait:8m26s done:13s)
0:38:41 INF - 	  encode 62x(33% 4m38s sys:8s wait:8m53s done:13s)
0:38:41 INF - 	  write   1x(87% 12m25s sys:1m29s wait:55s)
0:38:41 INF - ----------------------------------------
0:38:41 INF - 	features	184GB
0:38:41 INF - 	mbtiles	104GB

--compact-db=true:

0:37:38 INF - 	mbtiles          13m29s cpu:5h50m42s gc:46s avg:26
0:37:38 INF - 	  read    2x(26% 3m29s sys:1m1s wait:7m18s done:25s)
0:37:38 INF - 	  merge   1x(32% 4m22s sys:14s wait:8m23s done:11s)
0:37:38 INF - 	  encode 62x(35% 4m44s sys:9s wait:8m2s done:11s)
0:37:38 INF - 	  write   1x(87% 11m40s sys:1m18s wait:53s)
0:37:38 INF - ----------------------------------------
0:37:38 INF - 	features	184GB
0:37:38 INF - 	mbtiles	84GB

The read benchmark shows about 67.7k reads/s with regular and 64.7k reads/s with compact (I was running short on time so did a smaller test)

0:00:00 DEB - argument: bench_repetitions=3 (number of repetitions)
0:00:00 DEB - argument: bench_nr_tile_reads=50000 (number of tiles to read)
0:00:00 DEB - argument: bench_pre_warms=1 (number of pre warm runs)
0:00:00 DEB - argument: bench_mbtiles=data/output.mbtiles,data/compact.mbtiles (the mbtiles file to read from)
0:00:36 INF - working on data/output.mbtiles
0:00:43 INF - readOperationsPerSecondStats: DoubleSummaryStatistics{count=3, sum=203141.152027, min=62994.850298, average=67713.717342, max=70282.853583}
0:00:43 INF - working on data/compact.mbtiles
0:00:51 INF - readOperationsPerSecondStats: DoubleSummaryStatistics{count=3, sum=193959.728756, min=63488.766430, average=64653.242919, max=65851.192021}
0:00:51 INF - diffs
0:00:51 INF - "data/compact.mbtiles" to "data/output.mbtiles": avg read operations per second improved by 4.73367504185164%

Previous tests before this change took about 12m +/- 10s to write mbtiles so I don't have any concerns about --compact-db=false performance being different from what's on main now.

Copy link
Contributor

@msbarry msbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 2 minor followup comments then it should be good to merge.

public int generateContentHash() {
int hash = Hashing.FNV1_32_INIT;
for (var feature : entries) {
long layerId = extractLayerIdFromKey(feature.key());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layerId starts as a byte - can we just keep it a byte and avoid Longs.toByteArray ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! sorry, got fooled by #hasSameContents in which the result of #extractLayerIdFromKey is assigned to a variable of type long

@@ -0,0 +1,23 @@
package com.onthegomap.planetiler.util;

public final class Hashing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short javadoc to this class and the public members that explain what it's doing/how to use?

@sonarcloud
Copy link

sonarcloud bot commented May 24, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

57.7% 57.7% Coverage
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants