Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Achavi "seems not to work" for some changesets #9

Open
SomeoneElseOSM opened this issue Mar 2, 2015 · 12 comments
Open

Achavi "seems not to work" for some changesets #9

SomeoneElseOSM opened this issue Mar 2, 2015 · 12 comments

Comments

@SomeoneElseOSM
Copy link

An example of this is:

http://www.openstreetmap.org/changeset/28764270

http://nrenner.github.io/achavi/?changeset=28764270

what happens is that a spinner appears for a while and then stops, with no obvious error message. Nothing appears to be displayed on the screen. I'm guessing that something underneath is failing, but it'd be nice to know what.

@nrenner
Copy link
Owner

nrenner commented Mar 4, 2015

Agree that there should be an error message.

The error is a timeout of [this query](https://overpass-api.de/api/interpreter?data=[adiff:"2015-02-11T01:30:39Z","2015-02-11T01:30:44Z"];%28node%28bbox%29%28changed%29;way%28bbox%29%28changed%29;%29;out meta geom%28bbox%29;&bbox=-6.9433593,22.7401159,83.3402162,66.5498634):

[adiff:"2015-02-11T01:30:39Z","2015-02-11T01:30:44Z"];
(node(bbox)(changed);
  way(bbox)(changed););
out meta geom(bbox);
&bbox=-6.9433593,22.7401159,83.3402162,66.5498634

Overpass API does not support querying changesets, so my workaround is to query by the changeset's bbox and time range and to filter client-side, which does not really work for large (bbox and/or time) and overlapping changesets.

But in this case omitting the bbox filter seems to be way faster. Probably need to investigate when to use which filter.

@daganzdaanda
Copy link

Thanks for explaining! I have wondered about this sometimes as well. Showing a little message when any error comes up would be nice.
Thank you for this extremly helpful tool!

@daganzdaanda
Copy link

You wrote:

so my workaround is to query by the changeset's bbox and time range and to filter client-side, which does not really work for large (bbox and/or time) and overlapping changesets.

Would it be possible to divide large bounding boxes into several smaller queries to make sure that there are no timeouts? The results would then need to be filtered for duplicates, but maybe that is not too hard since you are filtering already?

@nrenner
Copy link
Owner

nrenner commented May 7, 2015

I guess omitting the bbox filter for larger bboxes would already help a lot. Still trying to make some progress with other stuff (brouter-web), but this issue will be one of the next things to address.

What I mean is that this approach feels like a waste of resources - even when split into managable smaller queries - with some changesets like from wheelmap_visitor that span large areas and several hours. Also, client-side filtering has its issues, see #10.

A better approach for such cases might be to get the changeset details from the main API, then query for the diff of the individual object versions (similar to what OSM History Viewer and Cool Name Goes Here are doing).

@mmd-osm
Copy link

mmd-osm commented Apr 1, 2017

A better approach for such cases might be to get the changeset details from the main API, then query for the diff of the individual object versions

I second that. In drolbr/Overpass-API#358, I have introduced a new language extension to provide a list of node/way/relation ids in a very compact format. Also, this leverages a more efficient implementation compared to the previous piecemeal approach (node(123);node(234);...).
Changesets spanning the whole globe would be a very good candidate for this feature, and we would have a new convincing use case for this pull request.

Baseline would be the list of node & way ids as returned by the OSM API: http://www.openstreetmap.org/api/0.6/changeset/28764270/download

Please take a look at the following query on the dev server: http://overpass-turbo.eu/s/nYa
Code was merged into 0.7.54 in the meantime: see http://overpass-turbo.eu/s/oiX

Longer term solution will be of course to directly specify the changeset id in the query as filter criterion: see drolbr/Overpass-API#367

@nrenner
Copy link
Owner

nrenner commented Jul 13, 2017

Thanks for implementing and suggesting the id filter.

Got stuck with working on this issue again. And now with Changeset Map solving this and other issues at least for new changesets [1], my motivation to work on further workarounds has not really increased. So I'll just share what I have so far.

As I fear stressing the main API too much, I wanted to do some time measurements for API downloads, and also for id queries:

Examples for big changesets were selected using queries on the ChangesetMD database import. The tests were done using shell commands, with curl for the request timing and playing with osmium-tool and awk for building id queries.

The remaining question would be when to use id, bbox or changed filter.

Longer term solution will be of course to directly specify the changeset id in the query as filter criterion: see drolbr/Overpass-API#367

Can you elaborate? I don't see how a filter would help us here as it still is restricted to current data or attic dates. A proper solution would probably require a new query type on the settings level in addition to the existing date/diff/adiff attic queries.

[1] Preparing accurate history and caching changesets

@mmd-osm
Copy link

mmd-osm commented Jul 13, 2017

Thanks for the update!

And now with Changeset Map solving this and other issues at least for new changesets

This one only starts some time in 2017, and I have no idea, if this solution can keep up with increasing number of changesets over time. As you know Overpass API did something similar with augmented diffs in the past, but it was discontinued due to space constraints. I hope that's different now with Mapbox sponsoring. Still, I'm really glad there's a working solution around now.

As overpass-api.de usually has quite some heavy CPU load, I wouldn't recommend to run performance tests on it. You could try dev.overpass-api.de/api_mmd/ instead. Please keep in mind that this instance uses hard disks, so re-running the same query at least twice would get reasonable results for what response times would look like on SSDs.

I don't see how a filter would help us here as it still is restricted to current data or attic dates.

That's right. Filter is not adequate in terms of terminology, it's as you said, the query result should reflect changes of a single changeset, regardless of the timestamp of the individual changes. Maybe we need to stress this requirement even more.

@nrenner
Copy link
Owner

nrenner commented Jul 14, 2017

This one only starts some time in 2017

"March 1, 2017", according to the diary entry, and I was hoping for "We are considering doing a slow backfill, but this is entirely dependent on Overpass.".

As you know Overpass API did something similar with augmented diffs in the past, but it was discontinued due to space constraints.

Yes, huge storage needs where probably the main reason for developing the attic database with on-demand queries. And I also think that storing full geometries for changes is very inefficient and doesn't really scale.

But at least caching the latest changesets for a couple of weeks back should be feasible. And now that a server infrastructure is in place to support reading changesets, I'm sure alternative solutions can be found for older changesets that free the client from the need to implement various hacky workarounds.

For example deriving and storing upload timestamps from edits, possibly with their smaller bounding box for multiple uploads within a changeset. Or, what just came to mind, maybe storing the original OSM minutely diffs reorganized by changeset and using these for on-demand id filter queries.

Maybe we need to stress this requirement even more.

Already thought about discussing this on the overpass mailing list.

@mmd-osm
Copy link

mmd-osm commented Jul 14, 2017

I tested your id query script on the api_mmd and api_new_feat endpoints I mentioned before. All measurements have been repeated a few times until they stabilize. That way overhead added by slow hard disk accesses is no longer relevant.

There was one issue I had to fix in the Apache fcgid config to allow larger request sizes, otherwise the server would return http 500 errors.

So here are the runtimes I've got when measuring directly on the development server, which is fairly idle.

test758 branch

Using lz4 compressed database

CS id size_download time_starttransfer time_total
cs-45229331.adiff.xml 678998 bytes 0.006 s 1.411 s
cs-45018920.adiff.xml 64037445 bytes 0.006 s 4.145 s
cs-45031863.adiff.xml 29485974 bytes 0.005 s 1.221 s
cs-45656403.adiff.xml 39335569 bytes 0.006 s 11.074 s
cs-46858895.adiff.xml 5309 bytes 0.105 s 0.105 s
cs-44989548.adiff.xml 2915627 bytes 0.006 s 0.311 s
cs-46988696.adiff.xml 1926804 bytes 0.005 s 0.350 s
cs-46552513.adiff.xml 2289108 bytes 0.005 s 0.308 s
cs-45488240.adiff.xml 9750251 bytes 0.005 s 0.569 s
cs-45489434.adiff.xml 29199936 bytes 0.006 s 1.205 s
cs-45064759.adiff.xml 13712262 bytes 0.005 s 0.690 s
cs-45639229.adiff.xml 9550251 bytes 0.005 s 0.513 s
cs-45579952.adiff.xml 12213546 bytes 0.005 s 0.635 s

new_features branch

Using zlib compressed database, like on overpass-api.de

CS id size_download time_starttransfer time_total
cs-45229331.adiff.xml 678998 bytes 0.007 s 4.491 s
cs-45018920.adiff.xml 64019410 bytes 0.005 s 6.703 s
cs-45031863.adiff.xml 29483020 bytes 0.005 s 3.027 s
cs-45656403.adiff.xml 39335569 bytes 0.005 s 20.558 s
cs-46858895.adiff.xml 5309 bytes 0.629 s 0.857 s
cs-44989548.adiff.xml 2915431 bytes 0.008 s 1.070 s
cs-46988696.adiff.xml 1926804 bytes 0.007 s 1.339 s
cs-46552513.adiff.xml 2289108 bytes 0.005 s 0.664 s
cs-45488240.adiff.xml 9750251 bytes 0.005 s 1.537 s
cs-45489434.adiff.xml 29199936 bytes 0.007 s 2.282 s
cs-45064759.adiff.xml 13712262 bytes 0.007 s 1.479 s
cs-45639229.adiff.xml 9550251 bytes 0.005 s 0.908 s
cs-45579952.adiff.xml 12213546 bytes 0.005 s 1.137 s

@mmd-osm
Copy link

mmd-osm commented Dec 27, 2018

Testing a new prototype right now, a changeset id based (changed) filter. Instead of pulling the object ids from osm api, it uses a new metadata filtering stage to accomplish the same. That's probably the fastest way to cut down the number of objects as early as possible without data model changes.

(runs on slow hard disks!)

@mmd-osm
Copy link

mmd-osm commented Apr 11, 2020

@nrenner
Copy link
Owner

nrenner commented Aug 1, 2020

Thanks for the update. This would be a cool feature and solve the large bbox issue - and it seems it's really fast, too!

What is the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants