-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support property aggregation on clustered features #2412
Comments
Interesting feature. Can you describe your use case in detail? How should it aggregate the properties? E.g. suppose we cluster a million points where each has different property keys and values — how should the top cluster properties look like then? |
User must provide an aggregation function, so in a case of points belonging to a cluster have different property keys and values or value types everything should be handled in that function. Mapbox-gl just applies it to the dataset. Let's say I want to build a population heatmap using population data for every human settlement in area. Feature example:
Now, I want every cluster to have a sum population of all included points as a property. I configure the source like this: map.addSource("points", {
"type": "geojson",
"data": "data/points.geojson",
"cluster": true,
"clusterMaxZoom": 14,
"clusterRadius": 20,
"clusterAggregations": {
"population_sum": function(prevValue, feature, i, arr) {
if (!('population' in feature) || isNaN(parseFloat(feature.population))) {
return prevValue; // handle the case of missing keys or wrong type
}
return prevValue + parseFloat(feature.population);
}
}
}); ... and every cluster from this source receives the |
I believe we won't be able to accept a function, since the mapbox style specification only allows valid JSON. Would it be enough to have something like this? clusterAggregate: ["population"] Then it would just do a sum for all given properties. |
It would be enough for certain cases, but definitely not as flexible. Do you think something like this http://bl.ocks.org/gisminister/10001728 could be possible with Mapbox GL API? |
Can also have some common mathematical functions That would be helpful. |
+1 for this feature request. Specifically:
Requested aggregate calculations for circle-cluster property data to support would be min, max, average, and sum. As an example, here is the map visual I want to create. To achieve this today, I aggregate the geojson data on the server to return one point with the aggregated "property" value I want. I'd like to completely port my heatmap calculations over to the new circle-cluster and circle data-driven style. The ask would be to be able to return the raw geojson point data from my data set, and cluster it using the new circle-cluster / geojson property aggregation feature. One challenge may be enforcing data types - geojson property data could be strings, floats, ints, etc. Some aggregate calcs may apply to certain data types but not others. |
Hi, is there any news on this feature? It would be awesome to have it on GL. |
I'd love this functionality too. I'm attempting to display several million points on a map, retrieved by an Elasticsearch geohash aggregate query (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html). This returns data where each row already includes a sum of the points within a certain area. As it stands the mapbox gl clustering code would only sum the number of results returned not the already-summed values within each result, which totals several million. |
I note this is already an issue on supercluster: mapbox/supercluster#12 |
I've been playing around with this and have something working with supercluster to pass in an arbitrary function that will let you aggregate properties. Then I was thinking for mapbox you could pass in something like It would be simple enough to have I have a feeling there will be a lot of red tape around this though - I pretty much have it coded already but it requires changes to supercluster, mapbox-gl, and mapbox-gl-style-spec. |
As stated above we cannot accept arbitrary functions for technical reasons.
If you're serious about working on this project (:heart:), please open a ticket in the |
I had a stab at this too and am using this fork in production: It uses a PR made by a kind person on supercluster (mapbox/supercluster#12) and my own fork of mapbox-gl-style-spec (https://github.com/davecranwell/mapbox-gl-style-spec) which adds support for the field added in the PR. Was unsure whether to submit this PRs since the one on supercluster has been lurking for a while now. |
@lucaswoj My understanding is that we can't pass a function into the mapbox styles because they only take JSON - I'd imagine part of that reasoning is because data passed to a worker needs to be serializable. However, supercluster could take any function and mapbox-gl could be the one to pass in the function (from the worker) based on the JSON you give it. Please correct me if I'm understanding something wrong. I'd be happy to work on the project - I'll get everything together soon and open a ticket. @davecranwell The mapbox/supercluster#12 pr is really similar to what I have, but the problem I have with it is you can only work with one property (metric). I think it would be handy to be able to aggregate multiple properties, so for example you can get the sum of one property, and the max of another, for example. I guess in theory you could have a property that is a hash and use that as the metric, but then with just adding |
Can you provide a concrete example (pretend Javascript) of how this would work?
Agreed. Whatever solution we end up merging will most likely work the way you describe. |
Sure, I still need to write tests and make changes to the spec. Also I figure it will need to be tested and code reviewed for style, browser support, error checking, safety/security, speed, etc. Regardless here's what I have so far: supercluster changes Then in your mapbox style you could provide an option like this:
For now there's no error checking, so if you don't have those properties you'll just get some concatted strings or some NaN's or something else weird probably. It just assumes those properties will be on each point, and carries the accumulated values over to the clusters. So your clusters will end up with the same properties, but with aggregated values. You could still filter the clusters and style them differently if you want by using |
Any news on this topic? I'd like to have a behavior for MGL similar to https://github.com/SINTEF-9012/PruneCluster which works great with Leaflet. |
@lamuertepeluda per https://www.mapbox.com/mapbox-gl-js/roadmap/ this feature is under active development right now 😄 |
@lucaswoj that's great news. I started contributing to the clustering engine, "supercluster". See I'll also try to fork mapbox-gl and add the accumulator parameter to the geojson source. I think it should have good enough performances. |
Hey, I've been pretty busy but I'll look into that pull request hopefully tonight. I'd actually been working off the cluster-agg-hash-array branch after some discussion about the style spec in mapbox/DEPRECATED-mapbox-gl#26. I'll merge that into master (in redbmk/supercluster) as well. @lucaswoj are we still depending on mapbox/mapbox-gl-style-spec#47 (it looks like the discussion for this has moved around a bit, but I think that's the latest). I don't see it on the roadmap. |
Alright I updated the master and cluster-agg-hash-array branches of redbmk/supercluster. We still need to come to some consensus on the syntax and I feel like the style spec needs to be fleshed out a bit. Right now it just accepts any object and has a comment suggesting how to use it, but it seems like there should be a way to specify exactly what options are allowed. |
I think the aggregator function should just be a parameter of the geojson source (worker options), see the workerOptions configuration rather than the style object. Once you do this, you should get the custom properties defined in the aggregator function together with {point_count} for each cluster like in the cluster example. I tried upgrading supercluster to the current version, and this makes {point_count} and {cluster_id} available for each cluster. So I think it would be enough to add an aggregator function parameter to the addSource that get passed down to the supercluster constructor as modified by @redbmk and me to get the custom aggregations for each point. This wouldn't require any changes to the current style specs but just to add some other layers, that like the "cluster-count" layer of the example, only represent the cluster population information. I already managed to do this but I was missing the information about the cluster population, that's why I switched to the mod of supercluster. |
That's how I have it coded here. The code is over a month behind from master now, but it's the same idea you mentioned, going through workerOptions. Ideally we would be able to add an aggregator function via the style spec or via the addSource function. That way you could add it to the original style and/or via addSource. I believe it's technically possible to add arbitrary functions to a web worker by creating a Blob. If that's true, maybe we could pass in a function as a string that would be parsed in the worker. Maybe the string could be either a url to a function (which could be a blob url or an external script), or a string representation of a function (e.g. |
Here is an interim solution to property aggregation using Supercluster in GL JS. https://bl.ocks.org/ryanbaumann/01b2c7fc0ddb7b27f6a72217bd1461ad |
Any updates on this? |
@benderlidze there is a lot of activity happening at #4777 to allow for arbitrary expressions. Once that is complete, it will be possible to use those expressions to add support for property aggregation. |
In addition to that v0.39, which was released a few weeks ago, ships with an updated version of supercluster that supports property clustering. |
@kkaefer there's no way to take advantage of that supercluster feature (without forking GL JS) until we expose it through the style spec. |
Any news on this feature? Alternative workaround? |
Any e.t.a for this feature? |
I attempted a new workaround approach (which is a bit complex and I'd like to publish as an example on dedicated github repo as soon as I have time) but mainly works like this:
Each cluster gets the custom stats defined in the worker. Of course it is suboptimal and I want to verify some other details, but it basically works and it's maybe better than having a copy of supercluster in the main thread, which |
Is there an update on this? We would also like to add this feature in our project. |
This would be a great feature for something we're working on at Open Listings Previously we've just been displaying homes on our map, which has been working fine with clustering. But we're currently working on coalescing homes that are all in the same building into a singular geojson feature, so there'll be less markers on our maps in densely populated areas. With the way clusters are working now, the I think if we store the building home count in our geojson metadata, then there should be a way to render the cluster value we want if a feature like this gets added! |
I've been really busy but I would love to work on this again now that Arbex is ready. Maybe I can make some time this weekend to take a look. |
@senoj you can do that right now with the external supercluster library, see ryanbaumann's post above. I successfully used that technique to render pie charts showing the ratios of different properties in the clusters. It'll definitely be simpler when that's part of the mapbox-api, but if your business needs are urgent, you can do it now 👍 |
@foundryspatial-duncan its not urgent, if there is no traction on this over the next month ill look at this. Id hate to put the work in and shortly after have it added to the mapbox-api :) |
@senoj @foundryspatial-duncan I also started to implement it with supercluster. This is working quite well. But I would also prefer, when it's integrated in the mapbox-api. |
@kkaefer @Qwervo @lamuertepeluda @JoJ123 @tvler @senoj I created a PR for this (#7004). I'm sure the syntax will change slightly after it's reviewed by the maintainers, but if you want to test it out now, that would help find any bugs or issues that may arise. Any feedback is welcome. Thanks! Also, if anybody is interested in helping out, I still need to do some benchmarking, make sure the documentation is up to date everywhere, etc. Any help would be much appreciated! |
Picking this up again, here's a PR proposing a different syntax/implementation for this feature #7584 — check it out! |
Last call for comments on #7584! It's ready for review and likely to be merged soon. |
Hello, I've started exploring this lib and have a feature request:
Provide reduce function(s) to clustering configuration, so it's possible to filter clusters by aggregated properties rather than just
point_count
.The text was updated successfully, but these errors were encountered: