TileReduce is a geoprocessing library that implements MapReduce to let you run scalable distributed spatial analysis using JavaScript and Mapbox Vector Tiles. TileReduce coordinates tasks across all available processors on a machine, so your analysis runs lightning fast.
npm install @mapbox/tile-reduce
A TileReduce processor is composed of two parts; the "map" script and the "reduce" script. The "map" portion comprises the expensive processing you want to distribute, while the "reduce" script comprises the quick aggregation step.
The map script operates on each individual tile. It's purpose is to receive one tile at a time, do analysis or processing on the tile, and write data and send results to the reduce script.
See the count example processor's map script
The reduce script serves both to initialize TileReduce with job options, and to handle reducing results returned by the map script for each tile.
See the count example processor's reduce script
zoom
specifies the zoom level of tiles to retrieve from each source.
tilereduce({
zoom: 15,
// ...
})
Path to the map script, which will be executed against each tile
tilereduce({
map: path.join(__dirname, 'map.js')
// ...
})
By default, TileReduce creates one worker process per CPU. maxWorkers
may be used to limit the number of workers created
tilereduce({
maxWorkers: 3,
// ...
})
By default, any data written from workers is piped to process.stdout
on the main process. You can pipe to an alternative writable stream using the output
option.
tilereduce({
output: fs.createWriteStream('output-file'),
// ...
})
Disables logging and progress output
tilereduce({
log: false,
// ...
})
Passes through arbitrary options to workers. Options are made available to map scripts as global.mapOptions
tilereduce({
mapOptions: {
bufferSize: 4
}
// ...
})
// map.js
module.exports = function (sources, tile, write, done) {
global.mapOptions.bufferSize; // = 4
};
Sources are specified as an array in the sources
option:
tilereduce({
sources: [
/* source objects */
],
// ...
})
tilereduce({
sources: [
{
name: 'osmdata',
mbtiles: __dirname+'/latest.planet.mbtiles',
layers: ['osm']
}
]
})
MBTiles work well for optimizing tasks that request many tiles, since the data is stored on disk. Create your own MBTiles from vector data using tippecanoe, or use OSM QA Tiles, a continuously updated MBTiles representation of OpenStreetMap.
Remote Vector Tile sources accessible over HTTP work well for mashups of datasets and datasets that would not be practical to fit on a single machine. Be aware that HTTP requests are slower than mbtiles, and throttling is typically required to avoid disrupting servers at high tile volumes. maxrate
dictates how many requests per second will be made to each remote source.
sources: [
{
name: 'streets',
url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
layers: ['roads'],
maxrate: 10
}
]
By default, sources will be automatically converted from their raw Vector Tile representation to GeoJSON. If you set raw: true
in an MBTiles or URL source, the raw Vector Tile data will be provided, allowing you to lazily parse features as needed. This is useful in some situations for maximizing performance.
sources: [
{
name: 'streets',
url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
raw: true
}
]
Jobs run over a geographic region represented by a set of tiles. TileReduce also accepts several area definitions that will be automatically converted into tiles.
A valid bounding box array.
tilereduce({
bbox: [w, s, e, n],
// ...
})
A valid GeoJSON geometry of any type.
tilereduce({
geojson: {"type": "Polygon", "coordinates": [/* coordinates */]},
// ...
})
An array of quadtiles represented as xyz arrays.
tilereduce({
tiles: [
[x, y, z]
],
// ...
})
Tiles can be read from an object mode node stream. Each object in the stream should be either a string in the format x y z
or an array in the format [x, y, z]
.
tilereduce({
tileStream: /* an object mode node stream */,
// ...
})
Line separated tile list files can easily be converted into the appropriate object mode streams using binary-split:
var split = require('binary-split'),
fs = require('fs');
tilereduce({
tileStream: fs.createReadStream('/path/to/tile-file').pipe(split()),
// ...
})
When using MBTiles sources, a list of tiles to process can be automatically retrieved from the source metadata
tilereduce({
sourceCover: 'osmdata',
sources: [
{
name: 'osmdata',
mbtiles: __dirname+'/latest.planet.mbtiles'
}
]
// ...
})
TileReduce returns an EventEmitter.
Fired once all workers are initialized and before the first tiles are sent for processing
tilereduce({/* ... */})
.on('start', function () {
console.log('starting');
});
Fired just before a tile is sent to a worker. Receives the tile and worker number assigned to process the tile.
tilereduce({/* ... */})
.on('map', function (tile, workerId) {
console.log('about to process ' + JSON.stringify(tile) +' on worker '+workerId);
});
Fired when a tile has finished processing. Receives data returned in the map function's done
callback (if any), and the tile.
var count = 0;
tilereduce({/* ... */})
.on('reduce', function (result, tile) {
console.log('got a count of ' + result + ' from ' + JSON.stringify(tile));
count++;
});
Fired when all queued tiles have been processed. Use this event to output final reduce results.
var count = 0;
tilereduce({/* ... */})
.on('end', function () {
console.log('Total count was: ' + count);
});
-
osm-coverage - a processor for computing statistics about OpenStreetMap coverage across countries.
-
osm-sidewalker - a processor for detecting potentially untagged sidewalks in OpenStreetMap.
npm test
npm run lint
npm run cover