-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API for getting diffs #60
Conversation
Hoping to get this worked out before Thursday so Global Sprinters have a working example of diffing. |
Running on staging for testing, e.g: or https://web-monitoring-db-staging.herokuapp.com/api/v0/pages/76d2e99c-c1a5-44e5-9781-6622573b9444/changes/..5ef676d4-0811-4111-aa80-19d2999ab178/diff/source?format=json (for fancy JSON data) |
Side note: this includes a wonky patch for an encoding issue in HTTParty (jnunemaker/httparty#542), which was also causing import issues (if raw version content had to be parsed, which is not the case for most imports). |
require_dependency 'differ/simple_diff' | ||
|
||
if ENV['DIFFER_SOURCE'] | ||
Differ.register(:source, Differ::SimpleDiff.new(ENV['DIFFER_SOURCE'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be too clever to allow arbitrary many differs with arbitrary names to be configured through ENV
by picking up all vars with names that match DIFFER_*
? Or will new differs likely need something more complex that SimpleDiff
? Offhand I'm not sure what they would need so the approach I'm floating seems doable -- but maybe less than clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow arbitrary many differs with arbitrary names to be configured through ENV by picking up all vars with names that match
DIFFER_*
?
Ha! I thought about the same idea and then said “nah, let’s not get ahead of ourselves.” I think I would at least change the var to DIFFER_SIMPLE_*
if we did that.
I can imagine some differs being more complex (or maybe SimpleDiff
will just grow)—we can‘t meaningfully diff PDFs with this, for example, but we really should eventually have that capability. There’s a lot of PDF content that is “tracked” but has no real tooling to support analyzing it. Versionista is also hitting some audio files and potentially things like CSVs. I haven't really done a full audit of content types. By nature, I’m sure PF is grabbing who-knows-what variety of content types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one stray single comment. I read this through twice and the tests look good. I agree with the general API. Let's merge and iterate.
Similar route to annotations, since both are about a change between two versions: GET /api/v0/page/{page id}/changes/{from id}..{to id}/diff/{diff type}
There is a registry of diffing services that can be set with: Differ.register(type, service) And retrieved with: Differ.for_type(type) # returns nil for unknown type Differ.for_type!(type) # raises for unknown type A diffing service need only respond to the `diff(change_model, options_hash)` method. Right now there is only one diffing service: SimpleDiff. It's configurable with a URL and simply requests the URL with the query args: - `a`: the uri for the content of the start version - `b`: the uri for the content of the end version
It turns out HTTParty has a pretty bad bug with response encodings: jnunemaker/httparty#542 This is hopefully just a temporary patch.
Make Rubocop happy. This syntax is only meaningfully different when modules are used with classes, which we aren't doing. For our use, they're equivalent.
Merging so we’ve got something for Global Sprint. Once we’ve got more diff services, we can come back around and consider iterating through |
Fixes #8.
This is currently implemented on top of @b5’s simple diff service: https://github.com/edgi-govdata-archiving/go-calc-diff
…but should theoretically work with any HTTP differ that accepts requests with two query arguments:
a
: A URL to the content of the starting versionb
: A URL to the content of the ending versione.g:
http://diff-it.com?a=http%3A%2F%2Fexample.com%2Fv1&b=http%3A%2F%2Fexample.com%2Fv2
Diffs are retrieved by requesting the URL:
(Meant to mirror the
annotations
API)query_params
are simply passed along to the underlying diff service. For example,go-calc-diff
can take aformat
argument that determines how the response is formatted (JSON, HTML document, HTML snippet):That returns a response like:
If the response was JSON data,
content
will actually an object. Otherwise, it will be a string.We only support one
diff_type
right now (source
), but the system is designed to be expandable. Under the hood, there is a registry that maps diff type names to actual diff service implementations. Implementations only need to respond to one method:For now, we just have one
SimpleDiff
class, which is instantiated with a URL. It can be used to interact with any diff service that matches the same protocol asgo-calc-diff
.