Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http backend #61

Merged
merged 5 commits into from
Oct 17, 2018
Merged

Http backend #61

merged 5 commits into from
Oct 17, 2018

Conversation

njgheorghita
Copy link
Contributor

@njgheorghita njgheorghita commented Jul 2, 2018

What was wrong?

URI backend needed to serve content-addressed github URIs

How was it fixed?

Wrote backend that serves github URIs conforming to

https://raw.githubusercontent.com/user/repo/commit_hash/path/to/manifest.json

Cute Animal Picture

image

@njgheorghita njgheorghita force-pushed the http-backend branch 3 times, most recently from d7e9ea0 to b5c6c98 Compare October 14, 2018 20:32
@njgheorghita
Copy link
Contributor Author

@pipermerriam ping for review. The scheme i'm using here isn't quite exactly, what's discussed in the ERC 1319](ethereum/EIPs#1319), I'm not sure if that's a problem or not. IMO just using the raw.githubusercontent.com authority is simpler than creating/fetching blobs as suggested in the spec. Any thoughts? Obvs, i'd like to conform to the spec wherever possible, but this is also an area that imo doesn't require strict conformity.

I also thought about requiring a content hash fragment suffixing the github uri . . . i.e.
https://raw.githubusercontent.com/ethpm/ethpm-spec/481739f6138907db88602558711e9d3c1301c269/examples/owned/contracts/Owned.sol#bfdea1fa5f33c30fee8443c5ffa1020027f8813e0007bb6f82aaa2843a7fdd60
for an added layer of security, but I recently removed that since the commit-hash is already in the uri, didn't seem necessary to have two content hashes

@pipermerriam
Copy link
Member

@njgheorghita on not including the hash in the URL: I don't think this is ok. While I agree that it doesn't allow for arbitrary modification of the resouce by the user, it doesn't provide protection against github serving up incorrect content. I acknowledge this probably isn't a concern we're likely to see realized, but I'd prefer we do it correctly and both require the hash as well as verifying the payload we receive matches said hash.

Which makes me realize that we should be doing the same with the IPFS gateway backends (and probably generically via any of the IPFS backends) just to be doubly sure that the content we receive is indeed the content that should have been at the address.

@njgheorghita njgheorghita force-pushed the http-backend branch 2 times, most recently from 6a4d4f2 to 04c78eb Compare October 16, 2018 15:30
@njgheorghita njgheorghita force-pushed the http-backend branch 4 times, most recently from 6cc6b74 to a2fe704 Compare October 16, 2018 21:54
@njgheorghita
Copy link
Contributor Author

@pipermerriam Makes sense, added the requirement for a valid content hash to the end of github uris. And added a validation to IPFS backend that the hash of the retrieved contents matches the content hash in the uri

@njgheorghita njgheorghita force-pushed the http-backend branch 2 times, most recently from 5277de2 to ac6b813 Compare October 16, 2018 22:04
Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I evidently never committed these two comments.

@pytest.mark.parametrize(
"uri",
(
"https://raw.githubusercontent.com/ethpm/ethpm-spec/3945c47dedb04930ee12c0281494a1b5bdd692a0/examples/owned/1.0.0.json", # noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth pointing out that this test will fail occasionally in CI since it makes network requests. I think that's ok for now, but it might be worth leaving a note in the test indicating as much.

("https://raw.githubusercontent.com", False),
# valid github urls
("http://raw.githubusercontent.com/any/path", True),
("https://raw.githubusercontent.com/any/path", True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing this will result in people not using content addressed URIs, which will result in people not correctly linking to commit based files which will lead to people using URIs like https://raw.githubusercontent.com/ethpm/ethpm-spec/master/examples/owned/1.0.0.json which will lead to people getting content that isn't guaranteed to remain static and I don't think that's something we want to allow.


.. py:method:: BaseURIBackend.can_resolve_uri(uri)

Returns a bool indicating whether or not this backend is capable of resolving the given URI to a manifest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cleanup now but:

  1. Wrong ?tense?. I think the official term is "imperitive present tense". See: Add BeaconStateMachine outline ethereum/py-evm#1373 (comment)
  2. These are more suited for API docs and thus can be done with autodoc and docstrings. Take a look at how @cburgdorf has structured the documentation in py-evm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 autodoc, moving all (or as many) of the docs over to autodoc is next on my todo list

- Go to the target manifest in your browser.
- Press ``y`` to generate the permalink in the address bar.
- Replace ``"github"`` with ``"raw.githubusercontent"``, and remove the ``"blob"`` namespace from the URI.
- Suffix the URI with ``#`` followed by the ``keccak`` hash of the bytes found at the Github URI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but it'd be pretty cool to provide a utility on the GithubOverHTTPSBackend that did this automatically with a uri and supported a handful of formats.

>>> GithubOverHTTPSBackend.get_permalink_with_content_hash('https://github.com/ethpm/ethpm-spec/blob/master/examples/owned/1.0.0.json')
"https://raw.githubusercontent.com/ethpm/ethpm-spec/3945c47dedb04930ee12c0281494a1b5bdd692a0/examples/owned/1.0.0.json#0x01cbc2a69a9f86e9d9e7b87475e2ba2619404dc8d6ee3cb3a8acf3176c2cace1"

Should be able to:

  • parse the repository name and file path and branch
  • use the github API to pull the latest commit hash
  • generate the raw.github.com URL
  • pull the file and compute the keccak of the contents.

This would be something well defined enough that it could be bountied.

return False

def fetch_uri_contents(self, uri: str) -> bytes:
http_uri, validation_hash = uri.split("#")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use urllib.parse.urlpase and pull the fragment from that return value.

validation_hash = generate_file_hash(contents)
if validation_hash != ipfs_hash:
raise ValidationError(
"Hashed IPFS contents retrieved from uri: {0} "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're python 3.6+ you can start using f-strings for formatting: https://realpython.com/python-f-strings/

raise ValidationError(
"Invalid content-addressed URI. "
"Validation hash:{0} does not match the hash of URI contents: {1}.".format(
decoded_validation, hashed_contents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to output these in their hex encoded formats for friendlier error messages.

@@ -104,6 +104,12 @@ def _get_factory(package, factory_name):
return _get_factory


@pytest.fixture
def owned_contract():
with open(V2_PACKAGES_DIR / "owned" / "contracts" / "Owned.sol") as file_obj:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also do (V2_PACKAGES_DIR / "owned" / "contracts" / "Owned.sol").open() but I'm undecided on whether I think it's a good pattern. The outer parenthesis wrapping is sort of awkward. Not advocating change, just bringing up my thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the parenthesis wrapping being weird, but i'm a fan of pathlib and think the weirdness is worth it

@@ -43,7 +43,7 @@ def test_get_dependency_package(dependencies):

def test_validate_build_dependencies(dummy_ipfs_backend, piper_coin_manifest):
result = validate_build_dependency(
"standard-token", "ipfs://QmVu9zuza5mkJwwcFdh2SXBugm1oSgZVuEKkph9XLsbUwg"
"standard-token", "ipfs://QmVu9zuza5mkJwwcFdh2SXBugm1oSgZVuEKkph9XLsbUwg#0x123"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 nothing 😬

@njgheorghita njgheorghita merged commit 19cc54a into ethpm:master Oct 17, 2018
@njgheorghita njgheorghita deleted the http-backend branch October 17, 2018 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants