-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the way changesets are displayed to use achavi #1376
Comments
Because nobody's written it - patches welcome. |
Any usage of Achavi would have to handle the "changeset too big for Achavi" problem which happens with e.g. http://www.openstreetmap.org/changeset/43855051 . |
Well I wasn't taking "use achavi" literally, aside from anything else I have absolutely no idea how it works but presumably it has some parallel database that we wouldn't want/need to replicate. I was just taking it as a request for an "achavi like" view. Performance would certainly be an issue - the example above was too slow to load to be usable on the main site I think which suggests that it might be even more problematic doing it from the main database that won't be optimised for this use. |
It uses Overpass augmented diffs afaik. Adapting it for use on OSM aould certainly be possible, but clearly we would need to address the issue of being dependent on Rolands Overpass instances (which as we know are overloaded), as we already are with the explore function. |
So, there arę issues causing that we don't want to use achavi as a main way to display changesets. Ok, I understand. How about the minimum plan then? Let's add a link to achavi in standard changeset display. Everyone uses it on his/her own risk. And it should be very easy to implement this way. |
As a matter of policy we don't normally link to third party services like that - there are dozens of changeset analyzers that people have built and if we link to one then they will all feel entitled to have one. |
To a certain extent this seems like a duplicate of https://trac.openstreetmap.org/ticket/1775. See also OWL. |
The Achavi is not scalable. Now it queries a changeset for a minute or so. If we are to direct hundreds of mappers to it, overpass api might choke. |
Could we not cache the output of the Overpass query that Achavi does so that the query only happens once per changeset? Clearly this would only work when the changeset is closed, but it would mean a much faster load for subsequent users and less load on Overpass. I was attempting to build something like this that would automatically generate "before and after" changeset files based on minutely diffs but ran out of time to set up the full history database. It would be interesting to have the main OSM database construct this sort of thing as an output alongside the minutely diffs. |
Well that only helps if a small number of changesets are viewed repeatedly, which doesn't seem likely. It sounds like all of this is irrelevant until we have our own overpass server anyway, as we don't want to put any more load on the current ones which are already struggling. |
Is there an OSMF overpass server in the works that would unblock this 🚀? |
It's on the wish list. |
From those I've talked to, a single overpass instance isn't enough for the load achavi on osm.org would impose, and certain types of changesets are too much for the method achavi uses. It doesn't make efficient use of resources, probably since it's not what overpass was originally designed for. |
It would be good to have some examples of such changesets, and maybe the time of the day when the query is being executed. Keeping an eye on the current CPU usage might also useful to see if this a general issue or "just" CPU load related. Edit: Adding link to known achavi issue on changesets spanning several hours and/or having a large bbox: nrenner/achavi#9, updating drolbr/Overpass-API#322 Are there any other issues not yet covered? FYI: I have also put up a demo achavi service, which queries a dev Overpass instance with performance improvements not yet available on the main instance. http://dev.overpass-api.de/tmp/psv/achavi/index.html?changeset=38125857 For testing you can simply adjust the changeset number as needed, and play around a bit. I'd be very interested to get some feedback on response times. The demo database is running on hard disks and was last updated on November 19th, 2016. It includes all changesets since the ODbL license change in Sep 2012. Copying Roland, @drolbr |
We're making some (albeit slow) progress on large changesets, like www.openstreetmap.org/changeset/45018920 (btw: kudos to @nrenner for supporting this effort!). As you can see in the performance measurement results, an Now, the main drawback at this time is, that we'd need a list of relevant {node/way/rel} ids, which we retrieve from the main OSM API and later pass to the Overpass query. Of course, it would make more sense to get those ids per changeset directly from Overpass API. That's still a todo for @drolbr, I would assume and my expectation is that won't need the main OSM API further down the road. This complements efforts by @geohacker to get decent history visualization, where caching is not yet available for older changesets, like for the initial example of cs 45018920: https://osmcha.mapbox.com/changesets/45018920 |
For an alternative, we could move changeset caching by @geohacker to an OSMF-supported hardware and use it for displaying changeset geometries. Having these span only until 2012 seems good enough to me, and there won't be any requests to a third-party service. For now, I often get temporarily blocked from Overpass just by using the query tool on the website 3-4 times in a row, who knows what happens if it is used at 20 changesets at once, with infinite scrolling enabled. |
Well, that reflects the situation on the current production server only. That's a different db compression, not all performance and scalability fixes in place, etc., leading to a rather high overall CPU utilization. For those reasons I don't recommend to use it as a reference point for performance testing. The tests I mentioned earlier all ran on an otherwise more or less idle development server, where only minutely updates are being applied. Also, you wouldn't display details for 20 changesets at once. Like in the case of OSMCha you start with a list of changeset metadata (id, description), and only once you drill down to a particular changeset, further details down to the node/way/rel level are shown. Storing all changesets as GeoJSON needs a lot of disk space and is probably not the most efficient approach to take. Overpass API has given up on a similar idea a few years back due to space constraints. I don't know what it means to store all cs back to 2012 using that approach, but disk requirements might be in the TB range. By contrast, the Overpass DB is at 200-240GB (depending on compression) including full history back to Sep 2012. Besides, @geohacker still uses Overpass to populate the cache, and every effort to make the original Overpass queries run smoother (or at all), would also immediately help his caching approach - win - win. |
You are definitely doing an important job. Speeding up Overpass API and finding faster ways of working with it is obviously useful. I am trying to map your numbers on the current history page of OSM, which displays 20 changesets at once. And that gives me 3-5 seconds for a user. I don't know the average size for a geojson-ed changeset, but if it is under 100k (which means under 10k gzipped: changeset jsons are very verbose), that would require under 500 GB for all the changesets, which is smaller than a rendering database. |
Ah, ok. I thought we were discussing how a single changeset should be rendered, like initially mentioned by @rmikke in this issue. I don't know what is means from a UX perspective when loading data for 20 changesets at once, which may be quite large. Before challenging an even more difficult problem, I would propose to focus on a single changeset for the time being...
I'd love to see those figures based on @geohacker's experience with the current cache. |
Similar to how that giant edit button gives you the option to choose from id, Potlatch 2, JOSM or Merkaartor. It would be very useful for mappers to have a dropdown select on the changeset page to open a changeset in a variety of 3rd party changeset analyses. With the default being whichever the user last choose, or set in their preferences. |
100k per Geojson file is spot on for the 10'000 changesets I checked. OSMCha doesn't use compression, which means that around 1.5 - 2TB of uncompressed GeoJSON data might have accumulated on AWS S3 in the meantime. It seems that @geohacker is no longer with Mapbox, so we will probably never find out the exact numbers. By the way, is everyone using OSMCha these days and this issue no longer being relevant? |
Hi @mmd-osm, your estimate sounds about right. I'll copy @jinalfoflia since she's at Mapbox and probably help us look at this, if needed. I think everyone has adopted OSMCha for inspecting changesets so this may not be relevant for the short-term. But I do think having this view on osm.org is valuable in the long run. |
Alternative approach to Achavi is being proposed here: https://www.openstreetmap.org/user/TrickyFoxy/diary/405188 |
When I get a link to changeset, all I see on the map is a rectangle containing all the changes. Like this.
Why don't we use Achavi changeset view? It shows much more.
If Achavi is considered not stable enough to replace the current view, there should be at least a link to Achavi at the end of the change list:
The text was updated successfully, but these errors were encountered: