Node location cache: off-heap storage and “array” implementation that supports parallel inserts #131

msbarry · 2022-03-10T12:02:41Z

Add more options for node location storage (aka LongLongMap since lat/lon gets packed into a 64-bit long for storage) and some code improvements along the way:

--nodemap-type=array option that stores node locations in an array indexed by node ID (8 bytes * max node ID) - and also supports parallel inserts. This makes OSM pass 1 2-3x faster on larger machines
--nodemap-storage=direct option to store node location cache in native off-heap "direct" memory using ByteBuffer.allocateDirect (you'll likely need to increase the -XX:MaxDirectMemorySize setting to use this)
Extracted common resource use checking code into ResourceUsage class that prints a more detailed breakdown of usage
Switched to enums to represent long long map and storage types
Extracted worker coordination logic (i.e. finish all nodes before moving onto ways) and element counting into an OsmPhaser class that also logs more details when a node begins/ends
Switch to posix_madvise instead of madvise

land checklist:

github-actions · 2022-03-10T12:09:10Z

Base 1cfcca2 This Branch 4005b3c

0:01:35 DEB [mbtiles] - Tile stats:
0:01:35 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:01:35 DEB [mbtiles] - z1 avg:4k max:4k
0:01:35 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:01:35 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:01:35 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:01:35 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:01:35 DEB [mbtiles] - z6 avg:1.4k max:24k
0:01:35 DEB [mbtiles] - z7 avg:898 max:33k
0:01:35 DEB [mbtiles] - z8 avg:366 max:48k
0:01:35 DEB [mbtiles] - z9 avg:296 max:278k
0:01:35 DEB [mbtiles] - z10 avg:164 max:232k
0:01:35 DEB [mbtiles] - z11 avg:107 max:131k
0:01:35 DEB [mbtiles] - z12 avg:85 max:118k
0:01:35 DEB [mbtiles] - z13 avg:72 max:109k
0:01:35 DEB [mbtiles] - z14 avg:68 max:256k
0:01:35 DEB [mbtiles] - all avg:70 max:0
0:01:35 DEB [mbtiles] -  # features: 5,277,205
0:01:35 DEB [mbtiles] -     # tiles: 4,115,418
0:01:35 INF [mbtiles] - Finished in 22s cpu:40s avg:1.8
0:01:35 INF [mbtiles] -   read    1x(8% 2s wait:19s)
0:01:35 INF [mbtiles] -   encode  2x(45% 10s wait:6s)
0:01:35 INF [mbtiles] -   write   1x(37% 8s wait:12s)
0:01:35 INF - Finished in 1m35s cpu:2m47s gc:3s avg:1.8
0:01:35 INF - FINISHED!
0:01:35 INF - 
0:01:35 INF - ----------------------------------------
0:01:35 INF - 	overall          1m35s cpu:2m47s gc:3s avg:1.8
0:01:35 INF - 	lake_centerlines 2s cpu:3s avg:1.7
0:01:35 INF - 	  read     1x(75% 1s)
0:01:35 INF - 	  process  2x(11% 0.2s wait:1s)
0:01:35 INF - 	  write    1x(0% 0s wait:2s)
0:01:35 INF - 	water_polygons   23s cpu:40s gc:1s avg:1.8
0:01:35 INF - 	  read     1x(64% 14s wait:2s)
0:01:35 INF - 	  process  2x(26% 6s wait:12s)
0:01:35 INF - 	  write    1x(4% 1s wait:21s)
0:01:35 INF - 	natural_earth    8s cpu:13s avg:1.6
0:01:35 INF - 	  read     1x(87% 7s sys:1s)
0:01:35 INF - 	  process  2x(19% 2s wait:7s)
0:01:35 INF - 	  write    1x(0% 0s wait:8s)
0:01:35 INF - 	osm_pass1        3s cpu:6s avg:1.9
0:01:35 INF - 	  read     1x(2% 0.1s wait:3s)
0:01:35 INF - 	  parse    1x(74% 2s)
0:01:35 INF - 	  process  1x(29% 0.9s wait:2s)
0:01:35 INF - 	osm_pass2        26s cpu:51s avg:2
0:01:35 INF - 	  read     1x(0% 0s wait:3s done:23s)
0:01:35 INF - 	  process  2x(77% 20s)
0:01:35 INF - 	  write    1x(1% 0.3s wait:25s)
0:01:35 INF - 	boundaries       0s cpu:0.1s avg:1.9
0:01:35 INF - 	sort             4s cpu:5s avg:1.2
0:01:35 INF - 	  worker  2x(38% 1s done:2s)
0:01:35 INF - 	mbtiles          22s cpu:40s avg:1.8
0:01:35 INF - 	  read    1x(8% 2s wait:19s)
0:01:35 INF - 	  encode  2x(45% 10s wait:6s)
0:01:35 INF - 	  write   1x(37% 8s wait:12s)
0:01:35 INF - ----------------------------------------
0:01:35 INF - 	features	268MB
0:01:35 INF - 	mbtiles	514MB
-rw-r--r-- 1 runner docker 55M Mar 19 09:37 run.jar

0:01:39 DEB [mbtiles] - Tile stats:
0:01:39 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:01:39 DEB [mbtiles] - z1 avg:4k max:4k
0:01:39 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:01:39 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:01:39 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:01:39 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:01:39 DEB [mbtiles] - z6 avg:1.4k max:24k
0:01:39 DEB [mbtiles] - z7 avg:898 max:33k
0:01:39 DEB [mbtiles] - z8 avg:366 max:48k
0:01:39 DEB [mbtiles] - z9 avg:296 max:278k
0:01:39 DEB [mbtiles] - z10 avg:164 max:232k
0:01:39 DEB [mbtiles] - z11 avg:107 max:131k
0:01:39 DEB [mbtiles] - z12 avg:85 max:118k
0:01:39 DEB [mbtiles] - z13 avg:72 max:109k
0:01:39 DEB [mbtiles] - z14 avg:68 max:256k
0:01:39 DEB [mbtiles] - all avg:70 max:0
0:01:39 DEB [mbtiles] -  # features: 5,277,205
0:01:39 DEB [mbtiles] -     # tiles: 4,115,418
0:01:39 INF [mbtiles] - Finished in 23s cpu:42s gc:1s avg:1.8
0:01:39 INF [mbtiles] -   read    1x(7% 2s wait:19s)
0:01:39 INF [mbtiles] -   encode  2x(44% 10s wait:6s)
0:01:39 INF [mbtiles] -   write   1x(35% 8s wait:12s)
0:01:39 INF - Finished in 1m39s cpu:2m49s gc:3s avg:1.7
0:01:39 INF - FINISHED!
0:01:39 INF - 
0:01:39 INF - ----------------------------------------
0:01:39 INF - 	overall          1m39s cpu:2m49s gc:3s avg:1.7
0:01:39 INF - 	lake_centerlines 1s cpu:3s avg:1.8
0:01:39 INF - 	  read     1x(78% 1s)
0:01:39 INF - 	  process  2x(12% 0.2s wait:1s)
0:01:39 INF - 	  write    1x(0% 0s wait:1s)
0:01:39 INF - 	water_polygons   23s cpu:40s gc:2s avg:1.8
0:01:39 INF - 	  read     1x(62% 14s wait:2s)
0:01:39 INF - 	  process  2x(27% 6s wait:10s)
0:01:39 INF - 	  write    1x(4% 1s wait:22s)
0:01:39 INF - 	natural_earth    8s cpu:13s avg:1.6
0:01:39 INF - 	  read     1x(89% 7s sys:1s)
0:01:39 INF - 	  process  2x(20% 2s wait:7s)
0:01:39 INF - 	  write    1x(0% 0s wait:7s)
0:01:39 INF - 	osm_pass1        3s cpu:5s avg:1.8
0:01:39 INF - 	  read     1x(3% 0.1s wait:2s)
0:01:39 INF - 	  parse    1x(76% 2s)
0:01:39 INF - 	  process  1x(41% 1s wait:1s)
0:01:39 INF - 	osm_pass2        26s cpu:52s avg:2
0:01:39 INF - 	  read     1x(0% 0s wait:3s done:23s)
0:01:39 INF - 	  process  2x(76% 20s)
0:01:39 INF - 	  write    1x(1% 0.3s wait:26s)
0:01:39 INF - 	boundaries       0s cpu:0.1s avg:1.6
0:01:39 INF - 	sort             4s cpu:5s avg:1.3
0:01:39 INF - 	  worker  2x(42% 2s done:2s)
0:01:39 INF - 	mbtiles          23s cpu:42s gc:1s avg:1.8
0:01:39 INF - 	  read    1x(7% 2s wait:19s)
0:01:39 INF - 	  encode  2x(44% 10s wait:6s)
0:01:39 INF - 	  write   1x(35% 8s wait:12s)
0:01:39 INF - ----------------------------------------
0:01:39 INF - 	features	268MB
0:01:39 INF - 	mbtiles	514MB
-rw-r--r-- 1 runner docker 55M Mar 19 09:35 run.jar

https://github.com/onthegomap/planetiler/actions/runs/2008248729

ℹ️ Base Logs 1cfcca2

0:00:00 DEB - argument: config=null (path to config file)
0:00:00 DEB - argument: area=rhode island (name of the extract to download if osm_url/osm_path not specified (i.e. 'monaco' 'rhode island' 'australia' or 'planet'))
0:00:00 INF - Using in-memory stats
0:00:00 INF [overall] - 
0:00:00 INF [overall] - Starting...
0:00:00 DEB - argument: bounds=Env[-74.07 : -17.84, 21.34 : 43.55] (bounds)
0:00:00 DEB - argument: threads=2 (num threads)
0:00:00 DEB - argument: loginterval=10 seconds (time between logs)
0:00:00 DEB - argument: minzoom=0 (minimum zoom level)
0:00:00 DEB - argument: maxzoom=14 (maximum zoom level (limit 14))
0:00:00 DEB - argument: defer_mbtiles_index_creation=false (skip adding index to mbtiles file)
0:00:00 DEB - argument: optimize_db=false (optimize mbtiles after writing)
0:00:00 DEB - argument: emit_tiles_in_order=true (emit tiles in index order)
0:00:00 DEB - argument: force=false (overwriting output file and ignore disk/RAM warnings)
0:00:00 DEB - argument: gzip_temp=false (gzip temporary feature storage (uses more CPU, but less disk space))
0:00:00 DEB - argument: sort_max_readers=6 (maximum number of concurrent read threads to use when sorting chunks)
0:00:00 DEB - argument: sort_max_writers=6 (maximum number of concurrent write threads to use when sorting chunks)
0:00:00 DEB - argument: nodemap_type=sortedtable (type of node location map: noop, sortedtable, or sparsearray)
0:00:00 DEB - argument: nodemap_storage=mmap (storage for location map: mmap or ram)
0:00:00 DEB - argument: nodemap_madvise=false (use linux madvise(random) to improve memory-mapped read performance)
0:00:00 DEB - argument: http_user_agent=Planetiler downloader (https://github.com/onthegomap/planetiler) (User-Agent header to set when downloading files over HTTP)
0:00:00 DEB - argument: http_timeout=30 seconds (Timeout to use when downloading files over HTTP)
0:00:00 DEB - argument: http_retries=1 (Retries to use when downloading files over HTTP)
0:00:00 DEB - argument: download_chunk_size_mb=100 (Size of file chunks to download in parallel in megabytes)
0:00:00 DEB - argument: download_threads=1 (Number of parallel threads to use when downloading each file)
0:00:00 DEB - argument: min_feature_size_at_max_zoom=0.0625 (Default value for the minimum size in tile pixels of features to emit at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: min_feature_size=1.0 (Default value for the minimum size in tile pixels of features to emit below the maximum zoom level)
0:00:00 DEB - argument: simplify_tolerance_at_max_zoom=0.0625 (Default value for the tile pixel tolerance to use when simplifying features at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: simplify_tolerance=0.1 (Default value for the tile pixel tolerance to use when simplifying features below the maximum zoom level)
0:00:00 DEB - argument: osm_lazy_reads=false (Read OSM blocks from disk in worker threads)
0:00:00 DEB - argument: tmpdir=data/tmp (temp directory)
0:00:00 DEB - argument: only_download=false (download source data then exit)
0:00:00 DEB - argument: download=false (download sources)
0:00:00 DEB - argument: only_fetch_wikidata=false (fetch wikidata translations then quit)
0:00:00 DEB - argument: fetch_wikidata=false (fetch wikidata translations then continue)
0:00:00 DEB - argument: use_wikidata=true (use wikidata translations)
0:00:00 DEB - argument: wikidata_cache=data/sources/wikidata_names.json (wikidata cache file)
0:00:00 DEB - argument: lake_centerlines_path=data/sources/lake_centerline.shp.zip (lake_centerlines shapefile path)
0:00:00 DEB - argument: free_lake_centerlines_after_read=false (delete lake_centerlines input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: water_polygons_path=data/sources/water-polygons-split-3857.zip (water_polygons shapefile path)
0:00:00 DEB - argument: free_water_polygons_after_read=false (delete water_polygons input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_path=data/sources/natural_earth_vector.sqlite.zip (natural_earth sqlite db path)
0:00:00 DEB - argument: free_natural_earth_after_read=false (delete natural_earth input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: osm_path=data/sources/rhode_island.osm.pbf (osm OSM input file path)
0:00:00 DEB - argument: free_osm_after_read=false (delete osm input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: mbtiles=data/out.mbtiles (mbtiles output file)
0:00:00 DEB - argument: transliterate=true (attempt to transliterate latin names)
0:00:00 DEB - argument: languages=am,ar,az,be,bg,br,bs,ca,co,cs,cy,da,de,el,en,eo,es,et,eu,fi,fr,fy,ga,gd,he,hi,hr,hu,hy,id,is,it,ja,ja_kana,ja_rm,ja-Latn,ja-Hira,ka,kk,kn,ko,ko-Latn,ku,la,lb,lt,lv,mk,mt,ml,nl,no,oc,pl,pt,rm,ro,ru,sk,sl,sq,sr,sr-Latn,sv,ta,te,th,tr,uk,zh (languages to use)
0:00:00 DEB - argument: only_layers= (Include only certain layers)
0:00:00 DEB - argument: exclude_layers= (Exclude certain layers)
0:00:00 DEB - argument: boundary_country_names=true (boundary layer: add left/right codes of neighboring countries)
0:00:00 DEB - argument: transportation_z13_paths=false (transportation(_name) layer: show all paths on z13)
0:00:00 DEB - argument: building_merge_z13=true (building layer: merge nearby buildings at z13)
0:00:00 DEB - argument: transportation_name_brunnel=false (transportation_name layer: set to false to omit brunnel and help merge long highways)
0:00:00 DEB - argument: transportation_name_size_for_shield=false (transportation_name layer: allow road names on shorter segments (ie. they will have a shield))
0:00:00 DEB - argument: transportation_name_limit_merge=false (transportation_name layer: limit merge so we don't combine different relations to help merge long highways)
0:00:00 DEB - argument: mbtiles_name=OpenMapTiles ('name' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org ('description' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a> ('attribution' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_version=3.13.0 ('version' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_type=baselayer ('type' attribute for mbtiles metadata)
0:00:00 DEB - argument: help=false (show arguments then exit)
0:00:00 INF - Building BasemapProfile profile into data/out.mbtiles in these phases:
0:00:00 INF -   lake_centerlines: Process features in data/sources/lake_centerline.shp.zip
0:00:00 INF -   water_polygons: Process features in data/sources/water-polygons-split-3857.zip
0:00:00 INF -   natural_earth: Process features in data/sources/natural_earth_vector.sqlite.zip
0:00:00 INF -   osm_pass1: Pre-process OpenStreetMap input (store node locations then relation members)
0:00:00 INF -   osm_pass2: Process OpenStreetMap nodes, ways, then relations
0:00:00 INF -   sort: Sort rendered features by tile ID
0:00:00 INF -   mbtiles: Encode each tile and write to data/out.mbtiles
0:00:00 INF - error loading /home/runner/work/planetiler/planetiler/data/sources/wikidata_names.json: java.nio.file.NoSuchFileException: data/sources/wikidata_names.json
0:00:00 INF - Using merge sort feature map, chunk size=1000mb workers=2
0:00:01 INF - dataFileCache open start
0:00:01 INF [lake_centerlines] - 
0:00:01 INF [lake_centerlines] - Starting...
0:00:03 INF [lake_centerlines] -  read: [  51k 100%  34k/s ] write: [    0    0/s ] 0    
    cpus: 1.7 gc:  3% mem: 140M/4.2G direct: 73k postGC: 74M
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:03 INF [lake_centerlines] -  read: [  51k 100%    0/s ] write: [    0    0/s ] 0    
    cpus: 1.8 gc:  0% mem: 140M/4.2G direct: 73k postGC: 74M
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:03 INF [lake_centerlines] - Finished in 2s cpu:3s avg:1.7
0:00:03 INF [lake_centerlines] -   read     1x(75% 1s)
0:00:03 INF [lake_centerlines] -   process  2x(11% 0.2s wait:1s)
0:00:03 INF [lake_centerlines] -   write    1x(0% 0s wait:2s)
0:00:03 INF [water_polygons] - 
0:00:03 INF [water_polygons] - Starting...
0:00:13 INF [water_polygons] -  read: [ 2.2k  16%  226/s ] write: [  75k 7.4k/s ] 4.7M 
    cpus: 1.9 gc:  9% mem: 1.8G/4.2G direct: 52M postGC: 989M
    read(62%) ->  (309/1k) -> process(31% 23%) ->  (1k/53k) -> write( 0%)
0:00:23 INF [water_polygons] -  read: [ 6.3k  44%  404/s ] write: [ 337k  26k/s ] 17M  
    cpus: 1.6 gc:  3% mem: 2.8G/4.2G direct: 52M postGC: 1.2G
    read(80%) ->    (0/1k) -> process(15% 22%) ->  (1k/53k) -> write( 1%)
0:00:25 INF [water_polygons] -  read: [  14k 100% 3.1k/s ] write: [ 4.3M 1.5M/s ] 186M 
    cpus: 1.8 gc:  9% mem: 1.4G/4.2G direct: 52M postGC: 1.4G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:25 INF [water_polygons] -  read: [  14k 100%    0/s ] write: [ 4.3M    0/s ] 186M 
    cpus: 2.5 gc:  0% mem: 1.4G/4.2G direct: 52M postGC: 1.4G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:25 INF [water_polygons] - Finished in 23s cpu:40s gc:1s avg:1.8
0:00:25 INF [water_polygons] -   read     1x(64% 14s wait:2s)
0:00:25 INF [water_polygons] -   process  2x(26% 6s wait:12s)
0:00:25 INF [water_polygons] -   write    1x(4% 1s wait:21s)
0:00:25 INF [natural_earth] - unzipping /home/runner/work/planetiler/planetiler/data/sources/natural_earth_vector.sqlite.zip to data/tmp/natearth.sqlite
0:00:31 INF [natural_earth] - 
0:00:31 INF [natural_earth] - Starting...
0:00:39 INF [natural_earth] -  read: [ 349k 100%  45k/s ] write: [  181   23/s ] 186M 
    cpus: 1.6 gc:  0% mem: 2.3G/4.2G direct: 52M postGC: 1.4G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:39 INF [natural_earth] -  read: [ 349k 100%    0/s ] write: [  181    0/s ] 186M 
    cpus: 3.4 gc:  0% mem: 2.3G/4.2G direct: 52M postGC: 1.4G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:39 INF [natural_earth] - Finished in 8s cpu:13s avg:1.6
0:00:39 INF [natural_earth] -   read     1x(87% 7s sys:1s)
0:00:39 INF [natural_earth] -   process  2x(19% 2s wait:7s)
0:00:39 INF [natural_earth] -   write    1x(0% 0s wait:8s)
0:00:39 INF [osm_pass1] - 
0:00:39 INF [osm_pass1] - Starting...
0:00:42 INF [osm_pass1] -  nodes: [ 4.5M 1.5M/s ] 354M  ways: [ 326k 111k/s ] rels: [ 7.5k 2.5k/s ] blocks: [  614  209/s ]
    cpus: 1.9 gc:  0% mem: 1.3G/4.2G direct: 52M postGC: 1.9G hppc: 880k
    read( -%) ->     (0/4) -> parse( -%) ->     (0/4) -> process( -%)
0:00:42 DEB [osm_pass1] - processed blocks:614 nodes:4,574,981 ways:326,663 relations:7,545
0:00:42 INF [osm_pass1] - Finished in 3s cpu:6s avg:1.9
0:00:42 INF [osm_pass1] -   read     1x(2% 0.1s wait:3s)
0:00:42 INF [osm_pass1] -   parse    1x(74% 2s)
0:00:42 INF [osm_pass1] -   process  1x(29% 0.9s wait:2s)
0:00:42 INF [osm_pass2] - 
0:00:42 INF [osm_pass2] - Starting...
0:00:45 DEB [osm_pass2:process] - Sorting long long multimap...
0:00:45 DEB [osm_pass2:process] - Sorted long long multimap 0s cpu:0s avg:1.9
0:00:45 WAR [osm_pass2:process] - No GB polygon for inferring route network types
0:00:52 INF [osm_pass2] -  nodes: [ 4.5M 100% 457k/s ] 354M  ways: [ 208k  64%  20k/s ] rels: [    0   0%    0/s ] features: [   5M  70k/s ] 234M  blocks: [  598  97%   59/s ]
    cpus: 2 gc:  1% mem: 891M/4.2G direct: 52M postGC: 804M hppc: 1.9M
    read( -%) ->  (14/103) -> process(66% 64%) -> (773/53k) -> write( 2%)
0:01:02 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 354M  ways: [ 326k 100%  11k/s ] rels: [ 3.4k  46%  343/s ] features: [ 5.2M  23k/s ] 258M  blocks: [  613 100%    1/s ]
    cpus: 2 gc:  1% mem: 768M/4.2G direct: 52M postGC: 818M hppc:  29M
    read( -%) ->   (0/103) -> process(81% 87%) -> (1.9k/53k) -> write( 1%)
0:01:08 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 354M  ways: [ 326k 100%    0/s ] rels: [ 7.5k 100%  721/s ] features: [ 5.2M 3.6k/s ] 268M  blocks: [  614 100%   <1/s ]
    cpus: 2 gc:  0% mem: 3G/4.2G direct: 52M postGC: 820M hppc:  29M
    read( -%) ->   (0/103) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:08 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 354M  ways: [ 326k 100%    0/s ] rels: [ 7.5k 100%    0/s ] features: [ 5.2M    0/s ] 268M  blocks: [  614 100%    0/s ]
    cpus: 3.4 gc:  0% mem: 3G/4.2G direct: 52M postGC: 820M hppc:  29M
    read( -%) ->   (0/103) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:08 DEB [osm_pass2] - processed blocks:614 nodes:4,574,981 ways:326,663 relations:7,545
0:01:08 INF [osm_pass2] - Finished in 26s cpu:51s avg:2
0:01:08 INF [osm_pass2] -   read     1x(0% 0s wait:3s done:23s)
0:01:08 INF [osm_pass2] -   process  2x(77% 20s)
0:01:08 INF [osm_pass2] -   write    1x(1% 0.3s wait:25s)
0:01:08 INF [boundaries] - 
0:01:08 INF [boundaries] - Starting...
0:01:08 INF [boundaries] - Creating polygons for 1 boundaries
0:01:08 WAR [boundaries] - Unable to form closed polygon for OSM relation 148838 (likely missing edges)
0:01:08 INF [boundaries] - Finished creating 0 country polygons
0:01:08 INF [boundaries] - Finished in 0s cpu:0.1s avg:1.9
0:01:08 INF - Deleting node.db to make room for output file
0:01:08 INF [sort] - 
0:01:08 INF [sort] - Starting...
0:01:12 INF [sort] -  chunks: [   1 /   1 100% ] 268M 
    cpus: 1.2 gc:  4% mem: 598M/4.2G direct: 52M postGC: 463M
    ->     (0/4) -> worker(-0%  -%)
0:01:12 INF [sort] -  chunks: [   1 /   1 100% ] 268M 
    cpus: 0 gc:  0% mem: 598M/4.2G direct: 52M postGC: 463M
    ->     (0/4) -> worker( -%  -%)
0:01:12 INF [sort] - Finished in 4s cpu:5s avg:1.2
0:01:12 INF [sort] -   worker  2x(38% 1s done:2s)
0:01:12 INF - read:1s write:1s sort:0s
0:01:12 INF [mbtiles] - 
0:01:12 INF [mbtiles] - Starting...
0:01:12 DEB [mbtiles:write] - Execute mbtiles: create table metadata (name text, value text);
0:01:12 DEB [mbtiles:write] - Execute mbtiles: create unique index name on metadata (name);
0:01:12 DEB [mbtiles:write] - Execute mbtiles: create table tiles (zoom_level integer, tile_column integer, tile_row, tile_data blob);
0:01:12 DEB [mbtiles:write] - Execute mbtiles: create unique index tile_index on tiles (zoom_level, tile_column, tile_row)
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: name=OpenMapTiles
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: format=pbf
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a>
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: version=3.13.0
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: type=baselayer
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: bounds=-74.07,21.34,-17.84,43.55
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: center=-45.955,32.445,3
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: minzoom=0
0:01:12 DEB [mbtiles:write] - Set mbtiles metadata: maxzoom=14
0:01:13 DEB [mbtiles:write] - Set mbtiles metadata: json={"vector_layers":[{"id":"aerodrome_label","fields":{"name_int":"String","iata":"String","ele_ft":"Number","name_de":"String","name":"String","icao":"String","name:en":"String","class":"String","name_en":"String","name:latin":"String","ele":"Number"},"minzoom":10,"maxzoom":14},{"id":"aeroway","fields":{"ref":"String","class":"String"},"minzoom":10,"maxzoom":14},{"id":"boundary","fields":{"disputed":"Number","admin_level":"Number","maritime":"Number","disputed_name":"String"},"minzoom":0,"maxzoom":14},{"id":"building","fields":{"colour":"String","render_height":"Number","render_min_height":"Number"},"minzoom":13,"maxzoom":14},{"id":"housenumber","fields":{"housenumber":"String"},"minzoom":14,"maxzoom":14},{"id":"landcover","fields":{"subclass":"String","class":"String","_numpoints":"Number"},"minzoom":8,"maxzoom":14},{"id":"landuse","fields":{"class":"String"},"minzoom":4,"maxzoom":14},{"id":"mountain_peak","fields":{"name_int":"String","customary_ft":"Number","ele_ft":"Number","name_de":"String","name":"String","rank":"Number","class":"String","name_en":"String","name:latin":"String","ele":"Number"},"minzoom":7,"maxzoom":14},{"id":"park","fields":{"name_int":"String","name_de":"String","name":"String","name:en":"String","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"place","fields":{"name:fy":"String","name_int":"String","capital":"Number","name:uk":"String","name:pl":"String","name:nl":"String","name:be":"String","name:ru":"String","name:ko":"String","name_de":"String","name":"String","rank":"Number","name:en":"String","name:eo":"String","class":"String","name:hu":"String","name:ta":"String","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":2,"maxzoom":14},{"id":"poi","fields":{"name_int":"String","level":"Number","name:nonlatin":"String","layer":"Number","name_de":"String","name":"String","subclass":"String","indoor":"Number","name:en":"String","class":"String","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":12,"maxzoom":14},{"id":"transportation","fields":{"access":"String","brunnel":"String","expressway":"Number","surface":"String","bicycle":"String","level":"Number","ramp":"Number","mtb_scale":"String","toll":"Number","oneway":"Number","layer":"Number","network":"String","horse":"String","service":"String","subclass":"String","class":"String","foot":"String"},"minzoom":4,"maxzoom":14},{"id":"transportation_name","fields":{"name_int":"String","name:nonlatin":"String","route_4":"String","route_3":"String","route_2":"String","route_1":"String","layer":"Number","network":"String","ref":"String","name_de":"String","name":"String","subclass":"String","ref_length":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"water","fields":{"intermittent":"Number","class":"String"},"minzoom":0,"maxzoom":14},{"id":"water_name","fields":{"name_int":"String","name:nonlatin":"String","name_de":"String","name":"String","intermittent":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":9,"maxzoom":14},{"id":"waterway","fields":{"name_int":"String","brunnel":"String","name_de":"String","_relid":"Number","intermittent":"Number","name":"String","class":"String","name:latin":"String","name_en":"String"},"minzoom":4,"maxzoom":14}]}
0:01:13 INF [mbtiles:write] - Starting z0
0:01:13 INF [mbtiles:write] - Finished z0 in 0s cpu:0s avg:0, now starting z1
0:01:13 INF [mbtiles:write] - Finished z1 in 0s cpu:0s avg:0, now starting z2
0:01:13 INF [mbtiles:write] - Finished z2 in 0s cpu:0s avg:0, now starting z3
0:01:13 INF [mbtiles:write] - Finished z3 in 0s cpu:0s avg:0, now starting z4
0:01:13 INF [mbtiles:write] - Finished z4 in 0s cpu:0s avg:0, now starting z5
0:01:13 INF [mbtiles:write] - Finished z5 in 0s cpu:0s avg:2.3, now starting z6
0:01:13 INF [mbtiles:write] - Finished z6 in 0s cpu:0s avg:0, now starting z7
0:01:13 INF [mbtiles:write] - Finished z7 in 0s cpu:0s avg:2.3, now starting z8
0:01:14 INF [mbtiles:write] - Finished z8 in 1s cpu:2s avg:2, now starting z9
0:01:15 INF [mbtiles:write] - Finished z9 in 0.9s cpu:2s avg:2, now starting z10
0:01:15 INF [mbtiles:write] - Finished z10 in 0.1s cpu:0.1s avg:2.1, now starting z11
0:01:15 INF [mbtiles:write] - Finished z11 in 0.6s cpu:1s avg:2, now starting z12
0:01:17 INF [mbtiles:write] - Finished z12 in 2s cpu:4s avg:2, now starting z13
0:01:22 INF [mbtiles] -  features: [ 728k  14%  72k/s ] 268M  tiles: [ 291k  29k/s ] 46M  
    cpus: 2 gc:  3% mem: 2.7G/4.2G direct: 52M postGC: 882M
    read( 3%) -> (214/217) -> encode(59% 58%) -> (215/216) -> write( 7%)
    last tile: 13/2470/3047 (z13 4%) https://www.openstreetmap.org/#map=13/41.80408/-71.45508
0:01:27 INF [mbtiles:write] - Finished z13 in 10s cpu:19s avg:2, now starting z14
0:01:32 INF [mbtiles] -  features: [ 4.3M  83% 364k/s ] 268M  tiles: [   3M 273k/s ] 386M 
    cpus: 1.7 gc:  1% mem: 1.4G/4.2G direct: 52M postGC: 897M
    read(10%) ->   (0/217) -> encode(41% 39%) -> (213/216) -> write(53%)
    last tile: 14/6483/6436 (z14 64%) https://www.openstreetmap.org/#map=14/35.96022/-37.55127
0:01:35 INF [mbtiles:write] - Finished z14 in 7s cpu:11s avg:1.5
0:01:35 INF [mbtiles] -  features: [ 5.2M 100% 421k/s ] 268M  tiles: [ 4.1M 509k/s ] 514M 
    cpus: 1.3 gc:  0% mem: 1.9G/4.2G direct: 52M postGC: 897M
    read( -%) ->   (0/217) -> encode( -%  -%) ->   (0/216) -> write( -%)
    last tile: 14/7380/5985 (z14 100%) https://www.openstreetmap.org/#map=14/43.56447/-17.84180
0:01:35 DEB [mbtiles] - Tile stats:
0:01:35 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:01:35 DEB [mbtiles] - z1 avg:4k max:4k
0:01:35 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:01:35 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:01:35 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:01:35 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:01:35 DEB [mbtiles] - z6 avg:1.4k max:24k
0:01:35 DEB [mbtiles] - z7 avg:898 max:33k
0:01:35 DEB [mbtiles] - z8 avg:366 max:48k
0:01:35 DEB [mbtiles] - z9 avg:296 max:278k
0:01:35 DEB [mbtiles] - z10 avg:164 max:232k
0:01:35 DEB [mbtiles] - z11 avg:107 max:131k
0:01:35 DEB [mbtiles] - z12 avg:85 max:118k
0:01:35 DEB [mbtiles] - z13 avg:72 max:109k
0:01:35 DEB [mbtiles] - z14 avg:68 max:256k
0:01:35 DEB [mbtiles] - all avg:70 max:0
0:01:35 DEB [mbtiles] -  # features: 5,277,205
0:01:35 DEB [mbtiles] -     # tiles: 4,115,418
0:01:35 INF [mbtiles] - Finished in 22s cpu:40s avg:1.8
0:01:35 INF [mbtiles] -   read    1x(8% 2s wait:19s)
0:01:35 INF [mbtiles] -   encode  2x(45% 10s wait:6s)
0:01:35 INF [mbtiles] -   write   1x(37% 8s wait:12s)
0:01:35 INF - Finished in 1m35s cpu:2m47s gc:3s avg:1.8
0:01:35 INF - FINISHED!
0:01:35 INF - 
0:01:35 INF - ----------------------------------------
0:01:35 INF - 	overall          1m35s cpu:2m47s gc:3s avg:1.8
0:01:35 INF - 	lake_centerlines 2s cpu:3s avg:1.7
0:01:35 INF - 	  read     1x(75% 1s)
0:01:35 INF - 	  process  2x(11% 0.2s wait:1s)
0:01:35 INF - 	  write    1x(0% 0s wait:2s)
0:01:35 INF - 	water_polygons   23s cpu:40s gc:1s avg:1.8
0:01:35 INF - 	  read     1x(64% 14s wait:2s)
0:01:35 INF - 	  process  2x(26% 6s wait:12s)
0:01:35 INF - 	  write    1x(4% 1s wait:21s)
0:01:35 INF - 	natural_earth    8s cpu:13s avg:1.6
0:01:35 INF - 	  read     1x(87% 7s sys:1s)
0:01:35 INF - 	  process  2x(19% 2s wait:7s)
0:01:35 INF - 	  write    1x(0% 0s wait:8s)
0:01:35 INF - 	osm_pass1        3s cpu:6s avg:1.9
0:01:35 INF - 	  read     1x(2% 0.1s wait:3s)
0:01:35 INF - 	  parse    1x(74% 2s)
0:01:35 INF - 	  process  1x(29% 0.9s wait:2s)
0:01:35 INF - 	osm_pass2        26s cpu:51s avg:2
0:01:35 INF - 	  read     1x(0% 0s wait:3s done:23s)
0:01:35 INF - 	  process  2x(77% 20s)
0:01:35 INF - 	  write    1x(1% 0.3s wait:25s)
0:01:35 INF - 	boundaries       0s cpu:0.1s avg:1.9
0:01:35 INF - 	sort             4s cpu:5s avg:1.2
0:01:35 INF - 	  worker  2x(38% 1s done:2s)
0:01:35 INF - 	mbtiles          22s cpu:40s avg:1.8
0:01:35 INF - 	  read    1x(8% 2s wait:19s)
0:01:35 INF - 	  encode  2x(45% 10s wait:6s)
0:01:35 INF - 	  write   1x(37% 8s wait:12s)
0:01:35 INF - ----------------------------------------
0:01:35 INF - 	features	268MB
0:01:35 INF - 	mbtiles	514MB
-rw-r--r-- 1 runner docker 55M Mar 19 09:37 run.jar

ℹ️ This Branch Logs 4005b3c

0:00:00 DEB - argument: config=null (path to config file)
0:00:00 DEB - argument: area=rhode island (name of the extract to download if osm_url/osm_path not specified (i.e. 'monaco' 'rhode island' 'australia' or 'planet'))
0:00:00 INF - Using in-memory stats
0:00:00 INF [overall] - 
0:00:00 INF [overall] - Starting...
0:00:00 DEB - argument: bounds=Env[-74.07 : -17.84, 21.34 : 43.55] (bounds)
0:00:00 DEB - argument: threads=2 (num threads)
0:00:00 DEB - argument: loginterval=10 seconds (time between logs)
0:00:00 DEB - argument: minzoom=0 (minimum zoom level)
0:00:00 DEB - argument: maxzoom=14 (maximum zoom level (limit 14))
0:00:00 DEB - argument: defer_mbtiles_index_creation=false (skip adding index to mbtiles file)
0:00:00 DEB - argument: optimize_db=false (optimize mbtiles after writing)
0:00:00 DEB - argument: emit_tiles_in_order=true (emit tiles in index order)
0:00:00 DEB - argument: force=false (overwriting output file and ignore disk/RAM warnings)
0:00:00 DEB - argument: gzip_temp=false (gzip temporary feature storage (uses more CPU, but less disk space))
0:00:00 DEB - argument: sort_max_readers=6 (maximum number of concurrent read threads to use when sorting chunks)
0:00:00 DEB - argument: sort_max_writers=6 (maximum number of concurrent write threads to use when sorting chunks)
0:00:00 DEB - argument: nodemap_type=sparsearray (type of node location map, one of [noop, sortedtable, sparsearray, array])
0:00:00 DEB - argument: nodemap_storage=mmap (storage for location map, one of [ram, mmap, direct])
0:00:00 DEB - argument: nodemap_madvise=false (use linux madvise(random) to improve memory-mapped read performance)
0:00:00 DEB - argument: http_user_agent=Planetiler downloader (https://github.com/onthegomap/planetiler) (User-Agent header to set when downloading files over HTTP)
0:00:00 DEB - argument: http_timeout=30 seconds (Timeout to use when downloading files over HTTP)
0:00:00 DEB - argument: http_retries=1 (Retries to use when downloading files over HTTP)
0:00:00 DEB - argument: download_chunk_size_mb=100 (Size of file chunks to download in parallel in megabytes)
0:00:00 DEB - argument: download_threads=1 (Number of parallel threads to use when downloading each file)
0:00:00 DEB - argument: min_feature_size_at_max_zoom=0.0625 (Default value for the minimum size in tile pixels of features to emit at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: min_feature_size=1.0 (Default value for the minimum size in tile pixels of features to emit below the maximum zoom level)
0:00:00 DEB - argument: simplify_tolerance_at_max_zoom=0.0625 (Default value for the tile pixel tolerance to use when simplifying features at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: simplify_tolerance=0.1 (Default value for the tile pixel tolerance to use when simplifying features below the maximum zoom level)
0:00:00 DEB - argument: osm_lazy_reads=false (Read OSM blocks from disk in worker threads)
0:00:00 DEB - argument: tmpdir=data/tmp (temp directory)
0:00:00 DEB - argument: only_download=false (download source data then exit)
0:00:00 DEB - argument: download=false (download sources)
0:00:00 DEB - argument: only_fetch_wikidata=false (fetch wikidata translations then quit)
0:00:00 DEB - argument: fetch_wikidata=false (fetch wikidata translations then continue)
0:00:00 DEB - argument: use_wikidata=true (use wikidata translations)
0:00:00 DEB - argument: wikidata_cache=data/sources/wikidata_names.json (wikidata cache file)
0:00:00 DEB - argument: lake_centerlines_path=data/sources/lake_centerline.shp.zip (lake_centerlines shapefile path)
0:00:00 DEB - argument: free_lake_centerlines_after_read=false (delete lake_centerlines input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: water_polygons_path=data/sources/water-polygons-split-3857.zip (water_polygons shapefile path)
0:00:00 DEB - argument: free_water_polygons_after_read=false (delete water_polygons input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_path=data/sources/natural_earth_vector.sqlite.zip (natural_earth sqlite db path)
0:00:00 DEB - argument: free_natural_earth_after_read=false (delete natural_earth input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: osm_path=data/sources/rhode_island.osm.pbf (osm OSM input file path)
0:00:00 DEB - argument: free_osm_after_read=false (delete osm input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: mbtiles=data/out.mbtiles (mbtiles output file)
0:00:00 DEB - argument: transliterate=true (attempt to transliterate latin names)
0:00:00 DEB - argument: languages=am,ar,az,be,bg,br,bs,ca,co,cs,cy,da,de,el,en,eo,es,et,eu,fi,fr,fy,ga,gd,he,hi,hr,hu,hy,id,is,it,ja,ja_kana,ja_rm,ja-Latn,ja-Hira,ka,kk,kn,ko,ko-Latn,ku,la,lb,lt,lv,mk,mt,ml,nl,no,oc,pl,pt,rm,ro,ru,sk,sl,sq,sr,sr-Latn,sv,ta,te,th,tr,uk,zh (languages to use)
0:00:00 DEB - argument: only_layers= (Include only certain layers)
0:00:00 DEB - argument: exclude_layers= (Exclude certain layers)
0:00:00 DEB - argument: boundary_country_names=true (boundary layer: add left/right codes of neighboring countries)
0:00:00 DEB - argument: transportation_z13_paths=false (transportation(_name) layer: show all paths on z13)
0:00:00 DEB - argument: building_merge_z13=true (building layer: merge nearby buildings at z13)
0:00:00 DEB - argument: transportation_name_brunnel=false (transportation_name layer: set to false to omit brunnel and help merge long highways)
0:00:00 DEB - argument: transportation_name_size_for_shield=false (transportation_name layer: allow road names on shorter segments (ie. they will have a shield))
0:00:00 DEB - argument: transportation_name_limit_merge=false (transportation_name layer: limit merge so we don't combine different relations to help merge long highways)
0:00:00 DEB - argument: mbtiles_name=OpenMapTiles ('name' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org ('description' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a> ('attribution' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_version=3.13.0 ('version' attribute for mbtiles metadata)
0:00:00 DEB - argument: mbtiles_type=baselayer ('type' attribute for mbtiles metadata)
0:00:00 DEB - argument: help=false (show arguments then exit)
0:00:00 INF - Building BasemapProfile profile into data/out.mbtiles in these phases:
0:00:00 INF -   lake_centerlines: Process features in data/sources/lake_centerline.shp.zip
0:00:00 INF -   water_polygons: Process features in data/sources/water-polygons-split-3857.zip
0:00:00 INF -   natural_earth: Process features in data/sources/natural_earth_vector.sqlite.zip
0:00:00 INF -   osm_pass1: Pre-process OpenStreetMap input (store node locations then relation members)
0:00:00 INF -   osm_pass2: Process OpenStreetMap nodes, ways, then relations
0:00:00 INF -   sort: Sort rendered features by tile ID
0:00:00 INF -   mbtiles: Encode each tile and write to data/out.mbtiles
0:00:00 INF - error loading /home/runner/work/planetiler/planetiler/data/sources/wikidata_names.json: java.nio.file.NoSuchFileException: data/sources/wikidata_names.json
0:00:00 DEB - ✓ 187M storage on / (/dev/root) requested for read phase disk, 32G available
0:00:00 DEB -  - 43M used for temporary node location cache
0:00:00 DEB -  - 143M used for temporary feature storage
0:00:00 DEB - ✓ 215M storage on / (/dev/root) requested for write phase disk, 32G available
0:00:00 DEB -  - 143M used for temporary feature storage
0:00:00 DEB -  - 71M used for mbtiles output
0:00:00 DEB - ✓ 321M JVM heap requested for read phase, 4.2G available
0:00:00 DEB -  - 300M used for sparsearray node location in-memory index
0:00:00 DEB -  - 21M used for temporary profile storage
0:00:00 DEB - ✓ 43M storage on / (/dev/root) requested for read phase, 32G available
0:00:00 DEB -  - 43M used for sparsearray node location cache
0:00:00 INF - Using merge sort feature map, chunk size=1000mb workers=2
0:00:03 INF - dataFileCache open start
0:00:03 INF [lake_centerlines] - 
0:00:03 INF [lake_centerlines] - Starting...
0:00:05 INF [lake_centerlines] -  read: [  51k 100%  36k/s ] write: [    0    0/s ] 0    
    cpus: 1.8 gc:  3% heap: 139M/4.2G direct: 73k postGC: 75M
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:05 INF [lake_centerlines] - Finished in 1s cpu:3s avg:1.8
0:00:05 INF [lake_centerlines] -   read     1x(78% 1s)
0:00:05 INF [lake_centerlines] -   process  2x(12% 0.2s wait:1s)
0:00:05 INF [lake_centerlines] -   write    1x(0% 0s wait:1s)
0:00:05 INF [water_polygons] - 
0:00:05 INF [water_polygons] - Starting...
0:00:15 INF [water_polygons] -  read: [ 2.2k  16%  220/s ] write: [  54k 5.2k/s ] 3.8M 
    cpus: 1.9 gc: 10% heap: 1.2G/4.2G direct: 52M postGC: 1.2G
    read(62%) ->  (390/1k) -> process(33% 23%) -> (556/53k) -> write( 0%)
0:00:25 INF [water_polygons] -  read: [ 6.3k  44%  411/s ] write: [ 360k  30k/s ] 18M  
    cpus: 1.6 gc:  7% heap: 2.6G/4.2G direct: 52M postGC: 1.5G
    read(77%) ->    (0/1k) -> process(18% 21%) -> (904/53k) -> write( 1%)
0:00:27 INF [water_polygons] -  read: [  14k 100% 3.3k/s ] write: [ 4.3M 1.6M/s ] 186M 
    cpus: 1.9 gc:  8% heap: 3.3G/4.2G direct: 52M postGC: 1.7G
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:27 INF [water_polygons] - Finished in 23s cpu:40s gc:2s avg:1.8
0:00:27 INF [water_polygons] -   read     1x(62% 14s wait:2s)
0:00:27 INF [water_polygons] -   process  2x(27% 6s wait:10s)
0:00:27 INF [water_polygons] -   write    1x(4% 1s wait:22s)
0:00:27 INF [natural_earth] - unzipping /home/runner/work/planetiler/planetiler/data/sources/natural_earth_vector.sqlite.zip to data/tmp/natearth.sqlite
0:00:35 INF [natural_earth] - 
0:00:35 INF [natural_earth] - Starting...
0:00:43 INF [natural_earth] -  read: [ 349k 100%  46k/s ] write: [  181   24/s ] 186M 
    cpus: 1.6 gc:  0% heap: 1G/4.2G direct: 52M postGC: 154M
    read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:43 INF [natural_earth] - Finished in 8s cpu:13s avg:1.6
0:00:43 INF [natural_earth] -   read     1x(89% 7s sys:1s)
0:00:43 INF [natural_earth] -   process  2x(20% 2s wait:7s)
0:00:43 INF [natural_earth] -   write    1x(0% 0s wait:7s)
0:00:43 INF [osm_pass1] - 
0:00:43 INF [osm_pass1] - Starting...
0:00:45 INF [osm_pass1:process] - Finished nodes: 4,574,981 (2.6M/s) in 2s cpu:3s avg:1.8
0:00:45 INF [osm_pass1:process] - Finished ways: 326,663 (455k/s) in 0.7s cpu:1s avg:2
0:00:45 INF [osm_pass1:process] - Finished relations: 7,545 (150k/s) in 0.1s cpu:0.1s avg:2
0:00:45 INF [osm_pass1] -  nodes: [ 4.5M 1.7M/s ] 408M  ways: [ 326k 128k/s ] rels: [ 7.5k 2.9k/s ] blocks: [  614  240/s ]
    cpus: 1.8 gc:  0% heap: 1.3G/4.2G direct: 52M postGC: 777M hppc: 880k
    read( -%) ->     (0/4) -> parse( -%) ->     (0/4) -> process( -%)
0:00:45 DEB [osm_pass1] - Processed 614 blocks:
0:00:45 DEB [osm_pass1] -   nodes: 4,574,981 (2.6M/s) in 2s cpu:3s avg:1.8
0:00:45 DEB [osm_pass1] -   ways: 326,663 (455k/s) in 0.7s cpu:1s avg:2
0:00:45 DEB [osm_pass1] -   relations: 7,545 (150k/s) in 0.1s cpu:0.1s avg:2
0:00:45 INF [osm_pass1] - Finished in 3s cpu:5s avg:1.8
0:00:45 INF [osm_pass1] -   read     1x(3% 0.1s wait:2s)
0:00:45 INF [osm_pass1] -   parse    1x(76% 2s)
0:00:45 INF [osm_pass1] -   process  1x(41% 1s wait:1s)
0:00:45 INF [osm_pass2] - 
0:00:45 INF [osm_pass2] - Starting...
0:00:49 DEB [osm_pass2:process] - Sorting long long multimap...
0:00:49 INF [osm_pass2:process] - Finished nodes: 4,574,981 (1.3M/s) in 3s cpu:7s avg:2
0:00:49 DEB [osm_pass2:process] - Sorted long long multimap 0s cpu:0s avg:2.2
0:00:49 WAR [osm_pass2:process] - No GB polygon for inferring route network types
0:00:56 INF [osm_pass2] -  nodes: [ 4.5M 100% 457k/s ] 408M  ways: [ 183k  56%  18k/s ] rels: [    0   0%    0/s ] features: [ 4.9M  63k/s ] 230M  blocks: [  594  97%   59/s ]
    cpus: 2 gc:  1% heap: 1.9G/4.2G direct: 52M postGC: 838M hppc: 1.6M
    read( -%) ->  (18/103) -> process(66% 65%) -> (1.4k/53k) -> write( 2%)
0:01:02 INF [osm_pass2:process] - Finished ways: 326,663 (24k/s) in 14s cpu:27s avg:2
0:01:06 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 408M  ways: [ 326k 100%  14k/s ] rels: [ 3.1k  42%  313/s ] features: [ 5.2M  29k/s ] 258M  blocks: [  613 100%    1/s ]
    cpus: 2 gc:  1% heap: 2.1G/4.2G direct: 52M postGC: 858M hppc:  29M
    read( -%) ->   (0/103) -> process(82% 77%) ->  (1k/53k) -> write( 1%)
0:01:10 INF [osm_pass2:process] - Finished relations: 7,545 (1k/s) in 7s cpu:15s avg:2
0:01:12 INF [osm_pass2] -  nodes: [ 4.5M 100%    0/s ] 408M  ways: [ 326k 100%    0/s ] rels: [ 7.5k 100%  691/s ] features: [ 5.2M 3.4k/s ] 268M  blocks: [  614 100%   <1/s ]
    cpus: 2 gc:  0% heap: 2.5G/4.2G direct: 52M postGC: 844M hppc:  29M
    read( -%) ->   (0/103) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:12 DEB [osm_pass2] - Processed 614 blocks:
0:01:12 DEB [osm_pass2] -   nodes: 4,574,981 (1.3M/s) in 3s cpu:7s avg:2
0:01:12 DEB [osm_pass2] -   ways: 326,663 (24k/s) in 14s cpu:27s avg:2
0:01:12 DEB [osm_pass2] -   relations: 7,545 (1k/s) in 7s cpu:15s avg:2
0:01:12 INF [osm_pass2] - Finished in 26s cpu:52s avg:2
0:01:12 INF [osm_pass2] -   read     1x(0% 0s wait:3s done:23s)
0:01:12 INF [osm_pass2] -   process  2x(76% 20s)
0:01:12 INF [osm_pass2] -   write    1x(1% 0.3s wait:26s)
0:01:12 INF [boundaries] - 
0:01:12 INF [boundaries] - Starting...
0:01:12 INF [boundaries] - Creating polygons for 1 boundaries
0:01:12 WAR [boundaries] - Unable to form closed polygon for OSM relation 148838 (likely missing edges)
0:01:12 INF [boundaries] - Finished creating 0 country polygons
0:01:12 INF [boundaries] - Finished in 0s cpu:0.1s avg:1.6
0:01:12 INF - Deleting node.db to make room for output file
0:01:12 INF [sort] - 
0:01:12 INF [sort] - Starting...
0:01:16 INF [sort] -  chunks: [   1 /   1 100% ] 268M 
    cpus: 1.3 gc:  0% heap: 3G/4.2G direct: 52M postGC: 844M
    ->     (0/4) -> worker( -%)
0:01:16 INF [sort] - Finished in 4s cpu:5s avg:1.3
0:01:16 INF [sort] -   worker  2x(42% 2s done:2s)
0:01:16 INF - read:1s write:1s sort:0s
0:01:16 INF [mbtiles] - 
0:01:16 INF [mbtiles] - Starting...
0:01:16 DEB [mbtiles:write] - Execute mbtiles: create table metadata (name text, value text);
0:01:16 DEB [mbtiles:write] - Execute mbtiles: create unique index name on metadata (name);
0:01:16 DEB [mbtiles:write] - Execute mbtiles: create table tiles (zoom_level integer, tile_column integer, tile_row, tile_data blob);
0:01:16 DEB [mbtiles:write] - Execute mbtiles: create unique index tile_index on tiles (zoom_level, tile_column, tile_row)
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: name=OpenMapTiles
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: format=pbf
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a>
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: version=3.13.0
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: type=baselayer
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: bounds=-74.07,21.34,-17.84,43.55
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: center=-45.955,32.445,3
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: minzoom=0
0:01:16 DEB [mbtiles:write] - Set mbtiles metadata: maxzoom=14
0:01:17 DEB [mbtiles:write] - Set mbtiles metadata: json={"vector_layers":[{"id":"aerodrome_label","fields":{"name_int":"String","iata":"String","ele_ft":"Number","name_de":"String","name":"String","icao":"String","name:en":"String","class":"String","ele":"Number","name_en":"String","name:latin":"String"},"minzoom":10,"maxzoom":14},{"id":"aeroway","fields":{"ref":"String","class":"String"},"minzoom":10,"maxzoom":14},{"id":"boundary","fields":{"disputed":"Number","admin_level":"Number","maritime":"Number","disputed_name":"String"},"minzoom":0,"maxzoom":14},{"id":"building","fields":{"colour":"String","render_height":"Number","render_min_height":"Number"},"minzoom":13,"maxzoom":14},{"id":"housenumber","fields":{"housenumber":"String"},"minzoom":14,"maxzoom":14},{"id":"landcover","fields":{"subclass":"String","class":"String","_numpoints":"Number"},"minzoom":8,"maxzoom":14},{"id":"landuse","fields":{"class":"String"},"minzoom":4,"maxzoom":14},{"id":"mountain_peak","fields":{"name_int":"String","customary_ft":"Number","ele_ft":"Number","name_de":"String","name":"String","rank":"Number","class":"String","name_en":"String","name:latin":"String","ele":"Number"},"minzoom":7,"maxzoom":14},{"id":"park","fields":{"name_int":"String","name_de":"String","name":"String","name:en":"String","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"place","fields":{"name:fy":"String","name_int":"String","capital":"Number","name:uk":"String","name:pl":"String","name:nl":"String","name:be":"String","name:ru":"String","name:ko":"String","name_de":"String","name":"String","rank":"Number","name:en":"String","name:eo":"String","class":"String","name:hu":"String","name:ta":"String","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":2,"maxzoom":14},{"id":"poi","fields":{"name_int":"String","level":"Number","name:nonlatin":"String","layer":"Number","name_de":"String","name":"String","subclass":"String","indoor":"Number","name:en":"String","class":"String","name:zh":"String","name_en":"String","name:latin":"String"},"minzoom":12,"maxzoom":14},{"id":"transportation","fields":{"brunnel":"String","access":"String","expressway":"Number","bicycle":"String","surface":"String","level":"Number","ramp":"Number","mtb_scale":"String","toll":"Number","layer":"Number","oneway":"Number","network":"String","horse":"String","service":"String","subclass":"String","class":"String","foot":"String"},"minzoom":4,"maxzoom":14},{"id":"transportation_name","fields":{"name_int":"String","name:nonlatin":"String","route_4":"String","route_3":"String","route_2":"String","route_1":"String","layer":"Number","network":"String","ref":"String","name_de":"String","name":"String","subclass":"String","ref_length":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":6,"maxzoom":14},{"id":"water","fields":{"intermittent":"Number","class":"String"},"minzoom":0,"maxzoom":14},{"id":"water_name","fields":{"name_int":"String","name:nonlatin":"String","name_de":"String","name":"String","intermittent":"Number","class":"String","name_en":"String","name:latin":"String"},"minzoom":9,"maxzoom":14},{"id":"waterway","fields":{"name_int":"String","brunnel":"String","name_de":"String","_relid":"Number","intermittent":"Number","name":"String","class":"String","name:latin":"String","name_en":"String"},"minzoom":4,"maxzoom":14}]}
0:01:17 INF [mbtiles:write] - Starting z0
0:01:17 INF [mbtiles:write] - Finished z0 in 0s cpu:0s avg:0, now starting z1
0:01:17 INF [mbtiles:write] - Finished z1 in 0s cpu:0s avg:0, now starting z2
0:01:17 INF [mbtiles:write] - Finished z2 in 0s cpu:0s avg:0, now starting z3
0:01:17 INF [mbtiles:write] - Finished z3 in 0s cpu:0s avg:0, now starting z4
0:01:17 INF [mbtiles:write] - Finished z4 in 0s cpu:0s avg:0, now starting z5
0:01:17 INF [mbtiles:write] - Finished z5 in 0s cpu:0s avg:0, now starting z6
0:01:17 INF [mbtiles:write] - Finished z6 in 0s cpu:0s avg:0, now starting z7
0:01:17 INF [mbtiles:write] - Finished z7 in 0s cpu:0s avg:3, now starting z8
0:01:18 INF [mbtiles:write] - Finished z8 in 1s cpu:2s avg:1.9, now starting z9
0:01:19 INF [mbtiles:write] - Finished z9 in 0.8s cpu:2s avg:1.9, now starting z10
0:01:19 INF [mbtiles:write] - Finished z10 in 0.1s cpu:0.2s avg:1.9, now starting z11
0:01:19 INF [mbtiles:write] - Finished z11 in 0.7s cpu:1s avg:2, now starting z12
0:01:22 INF [mbtiles:write] - Finished z12 in 2s cpu:5s avg:2, now starting z13
0:01:26 INF [mbtiles] -  features: [ 728k  14%  70k/s ] 268M  tiles: [ 291k  28k/s ] 46M  
    cpus: 2 gc:  9% heap: 1.8G/4.2G direct: 52M postGC: 875M
    read( 4%) -> (214/217) -> encode(55% 57%) -> (215/216) -> write( 7%)
    last tile: 13/2470/3047 (z13 4%) https://www.openstreetmap.org/#map=13/41.80408/-71.45508
0:01:31 INF [mbtiles:write] - Finished z13 in 10s cpu:20s avg:2, now starting z14
0:01:36 INF [mbtiles] -  features: [ 4.2M  80% 347k/s ] 268M  tiles: [ 2.8M 256k/s ] 367M 
    cpus: 1.8 gc:  1% heap: 3.2G/4.2G direct: 52M postGC: 941M
    read(10%) ->   (0/217) -> encode(42% 41%) -> (214/216) -> write(50%)
    last tile: 14/6348/6681 (z14 59%) https://www.openstreetmap.org/#map=14/31.48489/-40.51758
0:01:39 INF [mbtiles:write] - Finished z14 in 7s cpu:11s avg:1.5
0:01:39 INF [mbtiles] -  features: [ 5.2M 100% 407k/s ] 268M  tiles: [ 4.1M 476k/s ] 514M 
    cpus: 1.3 gc:  2% heap: 1.3G/4.2G direct: 52M postGC: 890M
    read( -%) ->   (0/217) -> encode( -%  -%) ->   (0/216) -> write( -%)
    last tile: 14/7380/5985 (z14 100%) https://www.openstreetmap.org/#map=14/43.56447/-17.84180
0:01:39 DEB [mbtiles] - Tile stats:
0:01:39 DEB [mbtiles] - z0 avg:7.9k max:7.9k
0:01:39 DEB [mbtiles] - z1 avg:4k max:4k
0:01:39 DEB [mbtiles] - z2 avg:9.4k max:9.4k
0:01:39 DEB [mbtiles] - z3 avg:3.9k max:6.4k
0:01:39 DEB [mbtiles] - z4 avg:1.6k max:4.6k
0:01:39 DEB [mbtiles] - z5 avg:1.4k max:8.1k
0:01:39 DEB [mbtiles] - z6 avg:1.4k max:24k
0:01:39 DEB [mbtiles] - z7 avg:898 max:33k
0:01:39 DEB [mbtiles] - z8 avg:366 max:48k
0:01:39 DEB [mbtiles] - z9 avg:296 max:278k
0:01:39 DEB [mbtiles] - z10 avg:164 max:232k
0:01:39 DEB [mbtiles] - z11 avg:107 max:131k
0:01:39 DEB [mbtiles] - z12 avg:85 max:118k
0:01:39 DEB [mbtiles] - z13 avg:72 max:109k
0:01:39 DEB [mbtiles] - z14 avg:68 max:256k
0:01:39 DEB [mbtiles] - all avg:70 max:0
0:01:39 DEB [mbtiles] -  # features: 5,277,205
0:01:39 DEB [mbtiles] -     # tiles: 4,115,418
0:01:39 INF [mbtiles] - Finished in 23s cpu:42s gc:1s avg:1.8
0:01:39 INF [mbtiles] -   read    1x(7% 2s wait:19s)
0:01:39 INF [mbtiles] -   encode  2x(44% 10s wait:6s)
0:01:39 INF [mbtiles] -   write   1x(35% 8s wait:12s)
0:01:39 INF - Finished in 1m39s cpu:2m49s gc:3s avg:1.7
0:01:39 INF - FINISHED!
0:01:39 INF - 
0:01:39 INF - ----------------------------------------
0:01:39 INF - 	overall          1m39s cpu:2m49s gc:3s avg:1.7
0:01:39 INF - 	lake_centerlines 1s cpu:3s avg:1.8
0:01:39 INF - 	  read     1x(78% 1s)
0:01:39 INF - 	  process  2x(12% 0.2s wait:1s)
0:01:39 INF - 	  write    1x(0% 0s wait:1s)
0:01:39 INF - 	water_polygons   23s cpu:40s gc:2s avg:1.8
0:01:39 INF - 	  read     1x(62% 14s wait:2s)
0:01:39 INF - 	  process  2x(27% 6s wait:10s)
0:01:39 INF - 	  write    1x(4% 1s wait:22s)
0:01:39 INF - 	natural_earth    8s cpu:13s avg:1.6
0:01:39 INF - 	  read     1x(89% 7s sys:1s)
0:01:39 INF - 	  process  2x(20% 2s wait:7s)
0:01:39 INF - 	  write    1x(0% 0s wait:7s)
0:01:39 INF - 	osm_pass1        3s cpu:5s avg:1.8
0:01:39 INF - 	  read     1x(3% 0.1s wait:2s)
0:01:39 INF - 	  parse    1x(76% 2s)
0:01:39 INF - 	  process  1x(41% 1s wait:1s)
0:01:39 INF - 	osm_pass2        26s cpu:52s avg:2
0:01:39 INF - 	  read     1x(0% 0s wait:3s done:23s)
0:01:39 INF - 	  process  2x(76% 20s)
0:01:39 INF - 	  write    1x(1% 0.3s wait:26s)
0:01:39 INF - 	boundaries       0s cpu:0.1s avg:1.6
0:01:39 INF - 	sort             4s cpu:5s avg:1.3
0:01:39 INF - 	  worker  2x(42% 2s done:2s)
0:01:39 INF - 	mbtiles          23s cpu:42s gc:1s avg:1.8
0:01:39 INF - 	  read    1x(7% 2s wait:19s)
0:01:39 INF - 	  encode  2x(44% 10s wait:6s)
0:01:39 INF - 	  write   1x(35% 8s wait:12s)
0:01:39 INF - ----------------------------------------
0:01:39 INF - 	features	268MB
0:01:39 INF - 	mbtiles	514MB
-rw-r--r-- 1 runner docker 55M Mar 19 09:35 run.jar

bchapuis · 2022-03-13T08:36:22Z

Are you noticing a significant gains with the parallel version? I guess that the machine used to produce the benchmarks displayed in the issue has a limited number of cores.

When experimenting with your code to replace lmdb in baremaps, I tried to make the memory and the serialization of records pluggable. The resulting AlignedDataList class is almost identical to your AppendStore and illustrates this relatively well. The constructor takes two parameters: an AlignedDataType (a fixed size data type aligned to the memory); a 'Memory' (which can be on-heap, off-heap, and memory-mapped).

https://github.com/baremaps/baremaps/blob/8cabb72922480fbe7d6b2ad8130a2704b52bf210/baremaps-collection/src/main/java/com/baremaps/collection/AlignedDataList.java#L51

As the Memory interface solely rely on ByteBuffers it is relatively easy to provide the on-heap, off-heap and on-disk implementations with byteBuffer.allocate, byteBuffer.allocateDirect, and channel.map. When it comes to DataTypes, the idea is to solely rely on absolute positions to access the content of the buffer to enable zero-copy serialization and deserialization. I guess that you don't need this for your use case, but I would be really intereseted to know what you think about it.

msbarry · 2022-03-13T11:28:40Z

Are you noticing a significant gains with the parallel version? I guess that the machine used to produce the benchmarks displayed in the issue has a limited number of cores.

Yes, the parallel version is quite a bit faster (for writes at least) - but only when running over the whole planet. I ran on a machine with 32GB RAM and 16 cpus, and the memory-mapped sparse array took about 11 minutes (15-17 million nodes per second, 4-6 million ways per second) but the parallel array implementation took 4m44s (40-50 million nodes per second, 6-10 million ways per second). And the data storage ends up only a little bit larger (74 instead of 68gb). The RAM parallel array implementation only takes 4m10s.

AlignedDataList looks pretty close to this! Once writes are positional though, they no longer have to be sequential and can happen from parallel threads. I'm curious how the write performance is on memory-mapped byte buffers? I've seen that reads from memory mapped byte buffers are fast, but writes are slower than other mechanisms (some more details here).

I am also curious if there's any noticeable difference using getLong/putLong on a HeapByteBuffer instead of int[] or long[]? My assumption was that regular arrays would be fastest, but if there's no difference then it would be nice to just have one ByteBuffer implementation that changes by whether it calls allocate or allocateDirect.

bchapuis · 2022-03-13T21:20:11Z

I just tried to run a benchmark to answer the question about write performance. Most of the benchmark I saw online were quite outdated (4 years and older).

Writing data on-disk with mapped buffers looks suspiciously fast on my 2020 macbook pro (I use the force method to ensure that the changes are written to the storage device).

When it comes to byte arrays and heap buffers, an overhead can be noticed when writing less than 1mb. Above 1mb, the two solutions look in par.

Direct buffers are slower than heap buffers, but the garbage collector is probably less solicited.

Benchmark                                          (bytes)  Mode  Cnt      Score      Error  Units
WriteMemoryBenchmark.writeBufferedOutputStream        1000    ss    3      0.508 ±    0.051  ms/op
WriteMemoryBenchmark.writeBufferedOutputStream       10000    ss    3      0.747 ±    3.422  ms/op
WriteMemoryBenchmark.writeBufferedOutputStream      100000    ss    3      3.093 ±    5.460  ms/op
WriteMemoryBenchmark.writeBufferedOutputStream     1000000    ss    3     23.331 ±   15.014  ms/op
WriteMemoryBenchmark.writeBufferedOutputStream    10000000    ss    3    221.102 ±   20.761  ms/op
WriteMemoryBenchmark.writeBufferedOutputStream   100000000    ss    3   2162.074 ±  669.249  ms/op
WriteMemoryBenchmark.writeBufferedOutputStream  1000000000    ss    3  21576.192 ±  857.302  ms/op
WriteMemoryBenchmark.writeByteArray                   1000    ss    3      0.018 ±    0.042  ms/op
WriteMemoryBenchmark.writeByteArray                  10000    ss    3      0.156 ±    0.343  ms/op
WriteMemoryBenchmark.writeByteArray                 100000    ss    3      0.369 ±    1.183  ms/op
WriteMemoryBenchmark.writeByteArray                1000000    ss    3      2.074 ±   26.183  ms/op
WriteMemoryBenchmark.writeByteArray               10000000    ss    3      8.898 ±   38.439  ms/op
WriteMemoryBenchmark.writeByteArray              100000000    ss    3     26.375 ±   46.770  ms/op
WriteMemoryBenchmark.writeByteArray             1000000000    ss    3    262.262 ± 1033.857  ms/op
WriteMemoryBenchmark.writeByteBuffer                  1000    ss    3      0.050 ±    0.031  ms/op
WriteMemoryBenchmark.writeByteBuffer                 10000    ss    3      0.345 ±    0.617  ms/op
WriteMemoryBenchmark.writeByteBuffer                100000    ss    3      0.639 ±    2.152  ms/op
WriteMemoryBenchmark.writeByteBuffer               1000000    ss    3      2.402 ±   44.364  ms/op
WriteMemoryBenchmark.writeByteBuffer              10000000    ss    3      9.275 ±   46.315  ms/op
WriteMemoryBenchmark.writeByteBuffer             100000000    ss    3     28.598 ±   78.618  ms/op
WriteMemoryBenchmark.writeByteBuffer            1000000000    ss    3    274.356 ±  883.777  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer            1000    ss    3      0.064 ±    0.214  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer           10000    ss    3      0.469 ±    2.310  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer          100000    ss    3      0.745 ±    1.025  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer         1000000    ss    3      2.355 ±   42.834  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer        10000000    ss    3      8.460 ±   62.737  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer       100000000    ss    3     61.445 ±   85.125  ms/op
WriteMemoryBenchmark.writeDirectByteBuffer      1000000000    ss    3    793.005 ± 3241.072  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer          1000    ss    3      0.436 ±    0.708  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer         10000    ss    3      0.653 ±    1.584  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer        100000    ss    3      1.446 ±    0.669  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer       1000000    ss    3      3.465 ±   56.955  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer      10000000    ss    3     18.684 ±  135.192  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer     100000000    ss    3    146.181 ±   71.446  ms/op
WriteMemoryBenchmark.writeMemoryMappedBuffer    1000000000    ss    3   1332.760 ± 1238.829  ms/op

Do not hesitate to let me know if you spot an issue in the benchmark. For now, the results tend to suggest that using ByteBuffers would be a good solution as they could be used to access on-heap, off-heap and on-disk memories. However, mapped buffers are quite OS dependant and the results may be different on linux.

msbarry · 2022-03-15T01:24:21Z

Thanks @bchapuis! I did some testing too with long[] vs. ByteBuffer.putLong() and they seem to be close enough, so I just dropped the int[] and long[] storage options for the append store and array implementations.

… (and debug logging)

msbarry · 2022-03-18T13:11:40Z

Ran some performance tests on a c5ad.16xlarge (64 cpu, 128gb ram) instance:

tl;dr) The parallel array nodemap implementation shaves about 5 minutes off of OSM pass 1 (8->3m) when storage=ram or 4 minutes when storage=mmap (8->4m). Also node read performance in pass 2 is about the same when RAM or MMAP storage is used, so switching to MMAP and using a smaller Xmx setting (~45g or less) will only add about 1 minute to generation time.

Xmx=115g nodemap=array/ram

full planet generation took 47m8s (cpu:26h53m3s gc:3m46s avg:34.2) - full logs in planet-logs/v0.3.0-planet-c5ad-128gb.txt
osm_pass1 took 2m58s (cpu:1h57m gc:35s avg:39.4)
osm_pass2 took 21m49s (cpu:19h14m46s gc:2m18s avg:52.9)

0:03:53 DEB [osm_pass1] - Processed 18,767 blocks:
0:03:53 INF [osm_pass1] -   nodes: 7,546,573,682 (108M/s) in 1m9s cpu:58m23s gc:10s avg:50.5
0:03:53 INF [osm_pass1] -   ways: 841,994,745 (8.5M/s) in 1m39s cpu:55m56s gc:24s avg:34
0:03:53 INF [osm_pass1] -   relations: 9,718,612 (1.1M/s) in 9s cpu:1m20s avg:9.3
0:03:53 INF [osm_pass1] - Finished in 2m58s cpu:1h57m gc:35s avg:39.4
0:03:53 INF [osm_pass1] -   read     1x(32% 58s sys:35s wait:1m11s done:9s)
0:03:53 INF [osm_pass1] -   process 63x(38% 1m8s sys:3s block:1m10s wait:8s)

0:25:42 DEB [osm_pass2] - Processed 18,767 blocks:
0:25:42 INF [osm_pass2] -   nodes: 7,546,573,682 (26M/s) in 4m45s cpu:4h4m2s gc:37s avg:51.4
0:25:42 INF [osm_pass2] -   ways: 841,994,745 (971k/s) in 14m27s cpu:12h32m24s gc:1m33s avg:52.1
0:25:42 INF [osm_pass2] -   relations: 9,718,612 (61k/s) in 2m38s cpu:2h39m41s gc:8s avg:60.6
0:25:42 INF [osm_pass2] - Finished in 21m49s cpu:19h14m46s gc:2m18s avg:52.9
0:25:42 INF [osm_pass2] -   read     1x(4% 50s sys:31s wait:17m48s done:2m40s)
0:25:42 INF [osm_pass2] -   process 63x(72% 15m40s block:51s wait:2m15s)
0:25:42 INF [osm_pass2] -   write    1x(64% 13m52s sys:1m16s wait:5m40s)

Xmx=115g nodemap=sparsearray/ram

osm_pass1 took 8m10s (cpu:2h22m54s gc:1m6s avg:17.5)
(killed during pass2, but it was progressing at about the same rate)

0:09:07 DEB [osm_pass1] - Processed 18,767 blocks:
0:09:07 INF [osm_pass1] -   nodes: 7,546,573,682 (23M/s) in 5m18s cpu:1h10m24s gc:39s avg:13.3
0:09:07 INF [osm_pass1] -   ways: 841,994,745 (5.2M/s) in 2m42s cpu:1h11m16s gc:26s avg:26.4
0:09:07 INF [osm_pass1] -   relations: 9,718,612 (1M/s) in 9s cpu:33s avg:3.7
0:09:07 INF [osm_pass1] - Finished in 8m10s cpu:2h22m54s gc:1m6s avg:17.5
0:09:07 INF [osm_pass1] -   read     1x(6% 31s sys:21s wait:7m8s done:7s)
0:09:07 INF [osm_pass1] -   parse   62x(8% 40s wait:7m16s done:7s)
0:09:07 INF [osm_pass1] -   process  1x(86% 7m1s)

Xmx=45g nodemap=array/mmap/madvise=true

osm_pass1 took 4m11s (cpu:1h48m24s gc:30s avg:26)
psm_pass2 took 21m41s (cpu:19h59m35s gc:2m8s avg:55.3)

0:05:08 DEB [osm_pass1] - Processed 18,767 blocks:
0:05:08 INF [osm_pass1] -   nodes: 7,546,573,682 (53M/s) in 2m20s cpu:45m4s gc:9s avg:19.3
0:05:08 INF [osm_pass1] -   ways: 841,994,745 (8.2M/s) in 1m41s cpu:1h1m30s gc:21s avg:36.4
0:05:08 INF [osm_pass1] -   relations: 9,718,612 (1.1M/s) in 8s cpu:1m12s avg:8.7
0:05:08 INF [osm_pass1] - Finished in 4m11s cpu:1h48m24s gc:30s avg:26
0:05:08 INF [osm_pass1] -   read     1x(17% 42s sys:25s wait:1m57s done:10s)
0:05:08 INF [osm_pass1] -   process 63x(23% 57s sys:2s block:1m17s wait:1m32s)

0:26:50 DEB [osm_pass2] - Processed 18,767 blocks:
0:26:50 INF [osm_pass2] -   nodes: 7,546,573,682 (28M/s) in 4m27s cpu:3h51m59s gc:30s avg:52.2
0:26:50 INF [osm_pass2] -   ways: 841,994,745 (962k/s) in 14m35s cpu:13h29m22s gc:1m30s avg:55.5
0:26:50 INF [osm_pass2] -   relations: 9,718,612 (60k/s) in 2m42s cpu:2h39m26s gc:8s avg:59.2
0:26:50 INF [osm_pass2] - Finished in 21m41s cpu:19h59m35s gc:2m8s avg:55.3
0:26:50 INF [osm_pass2] -   read     1x(4% 49s sys:31s wait:17m33s done:2m43s)
0:26:50 INF [osm_pass2] -   process 63x(76% 16m23s sys:8s block:43s wait:10s)
0:26:50 INF [osm_pass2] -   write    1x(55% 11m58s sys:1m13s wait:7m39s)

bchapuis · 2022-03-18T13:47:34Z

MMap's reading performances are impressive. Is your use of madvise significantly impact performance or do you think something changed in the way memory mapped files are implemented in Java 17?

msbarry · 2022-03-18T21:48:24Z

MMap's reading performances are impressive. Is your use of madvise significantly impact performance or do you think something changed in the way memory mapped files are implemented in Java 17?

Before Java 13 you needed to set the position on byte buffers then read in separate calls, then Java 13 added positional get methods so multiple threads can read from the same file without synchronization so I think that helped use cases like this. Now as long as you have more free memory than the file, it should be almost as fast as reading from RAM. Madvise probably didn't make much of a difference here, but it should help when the amount of free ram is less than the node cache file.

bchapuis · 2022-03-18T22:18:10Z

Thanks a lot for the clarification, I was aware about the positional get methods, but I would not have thought that it would have such an impact, even if a lot of ram is available.

… supports parallel inserts (onthegomap#131) * Add --nodemap-type=array option for 2-3x faster osm pass 1 imports * Add --nodemap-storage=direct option to experiment with direct (off-heap) memory usage * Extract ResourceUsage and OsmPhaser utilities

msbarry added 15 commits March 2, 2022 06:12

progress, and add support for multi-threaded pass1

9121f05

tweaks for testing

cf63419

not sure

3c90479

Merge branch 'main' into longlongmap-tweaks

568a027

progress on sliding window

7bc9608

sort of working

c80f882

Merge branch 'main' into longlongmap-tweaks

b8e9f57

progress

50ea9f4

tests passing

61964be

rm logs

1baf74a

format with eclipse formatter

522f0fc

Merge branch 'main' into longlongmap-tweaks

e807daa

use madvise

5f573aa

sliding window tests

b6c567c

cleanup and comments

b102634

msbarry changed the title ~~Implement off-heap storage and array node location map to support parallel inserts~~ Node location cache improvements: off-heap storage and “array” implementation that supports parallel inserts Mar 10, 2022

msbarry changed the title ~~Node location cache improvements: off-heap storage and “array” implementation that supports parallel inserts~~ Node location cache: off-heap storage and “array” implementation that supports parallel inserts Mar 10, 2022

msbarry added 4 commits March 10, 2022 21:05

refactor builders/size estimators

c8cab81

fixes

a86519f

improvements from testing it out

81f6c2f

remove temp dir before downloading

96ee59e

reduce array sizes to avoid huge object allocation

300f280

msbarry added 4 commits March 13, 2022 20:21

check direct memory usage

d00a297

rm debug

79af38f

extract common ResourceUsage utility

f11286c

update help for nodemap options

b55ecdd

remove primitive array option, ByteBuffer.putLong/putInt are sufficient

a971c22

msbarry added 14 commits March 14, 2022 21:26

debug log

694dc36

ids

a0cb93a

cleanup and javadoc

00ee932

small change, fix warning

c45fb8c

cleanup/add comments

a289d8f

comments

f107a5f

refactor and factor-out common mapFile method

f8953bb

fix tests on windows

d1a9d0f

Merge branch 'main' into longlongmap-tweaks

ce1aa93

Implement OsmPhaser to coordinate advancing through OSM element types…

d6a5ecf

… (and debug logging)

stop double-logging

6384887

change log

0c12f10

use posix_madvise

91d511a

overestimating array long long map size

a497d3a

msbarry added 2 commits March 18, 2022 09:11

updates from running on planet

91fe569

format

9ba06ed

tweak resource check

f3cc102

msbarry merged commit bf08169 into main Mar 19, 2022

msbarry deleted the longlongmap-tweaks branch October 5, 2022 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node location cache: off-heap storage and “array” implementation that supports parallel inserts #131

Node location cache: off-heap storage and “array” implementation that supports parallel inserts #131

msbarry commented Mar 10, 2022 •

edited

Loading

github-actions bot commented Mar 10, 2022 •

edited

Loading

bchapuis commented Mar 13, 2022

msbarry commented Mar 13, 2022 •

edited

Loading

bchapuis commented Mar 13, 2022 •

edited

Loading

msbarry commented Mar 15, 2022

msbarry commented Mar 18, 2022

bchapuis commented Mar 18, 2022 •

edited

Loading

msbarry commented Mar 18, 2022

bchapuis commented Mar 18, 2022

Node location cache: off-heap storage and “array” implementation that supports parallel inserts #131

Node location cache: off-heap storage and “array” implementation that supports parallel inserts #131

Conversation

msbarry commented Mar 10, 2022 • edited Loading

github-actions bot commented Mar 10, 2022 • edited Loading

bchapuis commented Mar 13, 2022

msbarry commented Mar 13, 2022 • edited Loading

bchapuis commented Mar 13, 2022 • edited Loading

msbarry commented Mar 15, 2022

msbarry commented Mar 18, 2022

bchapuis commented Mar 18, 2022 • edited Loading

msbarry commented Mar 18, 2022

bchapuis commented Mar 18, 2022

msbarry commented Mar 10, 2022 •

edited

Loading

github-actions bot commented Mar 10, 2022 •

edited

Loading

msbarry commented Mar 13, 2022 •

edited

Loading

bchapuis commented Mar 13, 2022 •

edited

Loading

bchapuis commented Mar 18, 2022 •

edited

Loading