-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data version API #7
Comments
So the big question here is:
|
From @trickvi on June 30, 2013 9:28 Are there any data packages on data.okfn.org that aren't stored in github? If so, is there some other way to generate a hash value, timestamps, etc.? I think the ETag value will always be closely tied to the underlying storage mechanism (and how that tracks changes in to resources). If it isn't possible to get some value you could always fall back on hashing/serving the version number (which works ok, but might not be updated) or do a md5 on the resources (computation heavy for data.okfn.org) Then there is the issue about checking for updates. I think this depends on the data but I don't think any data package is being updated real time so we don't have to check for updates in real time. I think it might be enough to check for this once a day or even once a week/month (I haven't looked at the existing data packages to infer what time period is necessary). This value could be saved in memory as some sort of a caching mechanism, but looking at github's caching mechanism I wouldn't rely on it as a caching strategy. Now these kinds of checks could of course be done by those who use the data but I don't think that it's the role of data package users. They just want to use the data, not create a mechanism to check for updates. They will most likely use some intermediate software that's unaware of the data context so that software package needs a generic caching strategy. |
Let's agreed on assumption of github only atm and you're suggesting you're happy with data being irregularly updated. In this case it shouldn't be too hard to do. Last question is what field this goes into. I'm guessing we use lastmodified or similar. |
From @trickvi on June 30, 2013 21:10 I'm rather fond of the ETag header instead of Last-Modified because you can do more with strings (e.g. hashes) than dates (ETag value can even be a date if you want to fall back on datapackage.json's last_modified value. One thing I've been thinking about is to suggest an optional checksum value for resources in data packages (since they are retrieved separately). This value could also be used as the ETag value in case it gets added to the standard (a discussion outside this issue). So I vote for ETag instead of Last-Modified. |
@tryggvib how important is this? at the moment I've given this only 1 star -- you haven't been bugging about it ;-) However, if this is a real blocker to your potential use case may be worth upgrading its priority :-) |
From @trickvi on June 20, 2013 12:37
I would like to be able to use data.okfn.org as an intermediary between my software and the data packages it uses and be able to quickly check whether there's a new version available of the data (e.g. if I've cached the package on a local machine).
There are ways to do it with the current setup:
I propose data.okfn.org provides an internal system to allow users to quickly check whether a new version might be released. This does not have to be an API. We could leverage HTTP's caching mechanism using an ETag header that would contain some hash value. This hash value can e.g. be the the sha value of heads ref objects served via the Github API:
Software that works with data packages could then implement a caching strategy and just send a request with an If-None-Match header along with a GET request for datapackage.json to either get a new version of the descriptor (and look at the version in that file) or just serve the data from its cache.
Copied from original issue: frictionlessdata/frictionlessdata.io#51
The text was updated successfully, but these errors were encountered: