Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node independent filter middleware #732

Merged
merged 11 commits into from
Sep 27, 2018

Conversation

dylanjw
Copy link
Contributor

@dylanjw dylanjw commented Apr 1, 2018

What was wrong?

See #551 (tldr; node filters suck)

How was it fixed?

Added a middleware that stores a "filtering generator" keyed by filter id. Calls to getFilterChanges will retrieve the stored generator and retrieve logs via eth_getLogs.

Features implemented:

  • Limit the eth_getLogs request size by set number of blocks.
  • Works with dynamic block ranges
    • toBlock = 'latest'
    • toBlock = 'pending' (need to test)
  • eth_newFilter
  • eth_getFilterChanges
  • eth_getFilterLogs
  • block filters
  • [ ] pending transaction filters

TODO:

  • Clean up tests: split tests up; unittests to tests/core/middleware, functional/integration to tests/core/filtering
  • Write more tests. In particular:
    • Check for missing or overlapping blocks in the results. (assumptions were made)

Cute Animal Picture

Cute animal picture

@dylanjw dylanjw changed the title Add new filter middleware prototype node independent filter middleware Apr 1, 2018
def unload_latest_blocks(web3, fromBlock, toBlock):
_fromBlock = fromBlock

current_block = int(web3.eth.blockNumber)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something returning a non-integer for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I misread an error and added this and didnt remember to take them out.

return _start, _end


def unload_latest_blocks(web3, fromBlock, toBlock):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what this function is used for (still ingesting the PR). Maybe add a docstring to explain the intent?

Copy link
Contributor Author

@dylanjw dylanjw Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a docstring. I was having trouble naming this function.

It returns block ranges starting from fromBlock to the latest mined block, until reaching toBlock. e.g.:

>>> blocks_to_filter = unload_latest_blocks(w3, 0, 50)
>>> next(blocks_to_filter)  # latest block number = 11
(0, 11)
>>> next(blocks_to_filter)  # latest block number = 45
(12, 45)
>>> next(blocks_to_filter)  # latest block number = 50
(46, 50)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • get_block_ranges
  • get_latest_block_ranges
  • iter_block_rangest
  • iter_latest_block_ranges

Don't particularly love any of those. The iter_ prefixed ones seems a bit more descriptive/semantically correct.

# no new blocks
if current_block < _fromBlock:
yield (None, None)
# reach stop_block after being passed by current_block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no stop_block in the context here. Is this comment outdated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

return _start, _end


def unload_latest_blocks(web3, fromBlock, toBlock):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've maintained using fromBlock instead of from_block in most public APIs, but for internal APIs I'd prefer we stick with from_block style casing.

address=None,
topics=None,):

if not fromBlock or fromBlock == 'latest':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fromBlock will result in fromBlock == 0 not iterating from the genesis block as expected, but rather just starting at the current block which looks like a bug.

_fromBlock = fromBlock

if _fromBlock > int(web3.eth.blockNumber):
raise ValueError("fromBlock ({0}) has not yet been mined".format(fromBlock))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this needs to be an error case. Why should we forbid someone setting up a filter in advance? If you want to punt on implementation can you just change this to a NotImplementedError and change the error messaging to indicate that this just hasn't been implemented yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can implement future block ranges.

filter_id_counter = map(to_hex, itertools.count())

def middleware(method, params):
if method in (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to both use a set in places where you are only doing membership checking and to elevate this to a module level constant so we don't have to allocate it every time the filter is called.


filters[filter_id] = _filter
return {'result': filter_id}
elif method == 'eth_getFilterChanges' or method == 'eth_getFilterLogs':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like eth_getFilterLogs is supposed to have it's own separate implementation? (based on your PR checklist and the differences in their respective functionality)

@dylanjw dylanjw force-pushed the filtering-middleware branch 3 times, most recently from 5cf5c8c to cafbbe9 Compare April 3, 2018 03:37
@dylanjw
Copy link
Contributor Author

dylanjw commented Apr 3, 2018

Im starting to suspect that a pending transaction filter is not possible. Some of the reasons:

  • I think it would involve polling the transaction pool, but that carries the risk of missing transactions.
  • Infura does not yet have any tx pool apis.
  • API is node dependent (not an impossible problem).

Maybe we could add a recipe to the docs for an alternate way from node filters to get information on pending transactions, in lieu of having the filtering middleware handle it.

@pipermerriam
Copy link
Member

@dylanjw re pending transactions.

  1. This middleware doesn't need to be functional for anything but log filtering so feel free to drop block and pending transaction filters from scope.
  2. Pending transaction filters provide no guarantees that you'll see all of them anyways, a node can only provide info on pending transactions they know about and there will be plenty that they never learn about prior to being mined.
  3. since there's no guarantees around what pending transactions will be visible to the node and thus exposed in filters, I think it's ok for us to do the following.
  • A middleware specific to the transaction pool APIs which will only work if they are available (and fails hard if the endpoints are not available).
  • A middleware that only monitors for eth_sendTransaction and eth_sendRawTransaction and only returns transaction hashes that traverse those APIS

I still think I'd recommend separating this functionality from this PR and doing it as an independent PR to try and limit the scope of this feature.

@carver
Copy link
Collaborator

carver commented Apr 3, 2018

If you rebase this on master, some of these errors will be fixed, because master is upgraded to eth-tester devnum 22 @dylanjw

@dylanjw
Copy link
Contributor Author

dylanjw commented Apr 5, 2018

@pipermerriam I implemented the block filter in the PR because it was easy to do; most of the work is done with the iter_latest_block_ranges(). Is it ok to leave it here?

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments thus far.

with pytest.raises(expected):
w3.manager.request_blocking(method, args)
else:
assert w3.manager.request_blocking(method, args) == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you are reaching down into the w3.manager instead of using w3.eth.getFilterchanges, w3.eth.getFilterlogs, ...?

('eth_newFilter', [{}], '0x1'),
('eth_newPendingBlockFilter', [{}], NotImplementedError),
('eth_newBlockFilter', [{}], '0x2'),
('eth_getFilterChanges', ['0x0'], []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this entry is requesting the filter which is setup here: https://github.com/ethereum/web3.py/pull/732/files#diff-ddc63b66c3f0acb70cca05f91281a51dR56

Assuming I'm reading that correct, it seems bad that this test is tightly coupled with state which was setup in that fixture, as well as that the test itself is stateful at the module level, which allows for state to leak across test runs. I think it'd be better if this could be refactored to:

  1. not have shared state across test runs
  2. setup the necessary state within this test.

yield web3.eth.getBlock(block_number).hash


def filter_middleware(make_request, web3):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nitpick:

  • log_filter_middleware (my preference)
  • event_filter_middleware
  • local_filter_middleware

Just the term filter_ doesn't quite seem specific enough.

"Incompatible start and stop arguments.",
"Start must be less than or equal to stop.")

def range_counter(start, stop, step):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why there is an outer and inner function call here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper is so the exception is raised when the generator is created. e.g.

without the wrapper:

>>> s = segment_count(5, 0)
>>> next(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "web3/middleware/filter.py", line 45, in segment_count
    "Start must be less than or equal to stop.")
TypeError: ('Incompatible start and stop arguments.', 'Start must be less than or equal to stop.')
>>> 

With the wrapper:

>>> s = segment_count(5, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "web3/middleware/filter.py", line 45, in segment_count
    "Start must be less than or equal to stop.")
TypeError: ('Incompatible start and stop arguments.', 'Start must be less than or equal to stop.')
>>> 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I wasn't aware that generators were lazy in that way. Neat and also kind of awkward. This deserves a comment imho as it's not clear just by reading the code.

)


def range_counter(start, stop=None, step=5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this whole function can be made significantly simpler using cytoolz.sliding_window

def range_counter(start, stop=None, step=5):
    if stop == None:
        base_iter = itertools.count(start, step)
    else:
        # todo: if `stop` won't be included in the `range` output, do something 
        # like `cytoolz.concatv(range(start, stop, step), [stop]) to add `stop` to 
        # the list of values that will be generated.
        base_iter = range(start, stop, step)

    for left, right in sliding_window(2, base_iter):
        yield left, right

Haven't tested this code but I think it at least does close to what you are trying to do here with less mental overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you can accomplish the generator with validation on creation with the following pattern.

def my_gen():
    if thing:
        raise Exception
    return (v for v in things)

This is assuming that you can fit what you need within a comprehension and still maintain readability.

"""
# Return full range if all blocks exist
if to_block is not None and to_block <= web3.eth.blockNumber:
return from_block, to_block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does what you want.

In [14]: def f(v):
    ...:     if v:
    ...:         return 1, 2
    ...:     yield 3, 4
    ...:     yield 5, 6
    ...:

In [15]: tuple(f(True))
Out[15]: ()

In [16]: tuple(f(False))
Out[16]: ((3, 4), (5, 6))

I think this needs to be a yield statement followed by a blank return statement if you want to generate the single value and then terminate.

else:
yield from_block, current_block

yield from get_block_range(current_block + 1, to_block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern isn't going to work as it'll hit the recursion depth limit quite quickly since every step of the iteration requires a new stack frame. This needs to be refactored such that it isn't recursive.

def range_counter(start, stop, step):
def iter_ranges(start, step):
yield start, start + step - 1
yield from iter_ranges(start + step, step)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here. This solution is recursive and will easily hit the recursion limit. Needs to be refactored to something non-recursive.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • missing a test for calling web3.eth.getFilterChanges with an unknown filter_id.
  • would be nice to actually test that the filters work as expected. We could potentially fuzz test this against an eth-tester backed filter via two separate web3 instances (one eth-tester and one middleware backed) to verify that they return the correct log entries.

def finite(start, stop, step):
# If the initial range is less than the step
# just return (start, stop)
if stop < step:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if start + step >= stop?

if stop < step:
yield (start, stop)
return
for i in zip(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nitpick

for segment in zip(...):
    ...

yield (stop - remainder, stop)

def infinite(start, step):
for i in zip(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same naming nitpick s/i/segment

if remainder:
yield (stop - remainder, stop)

def infinite(start, step):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be elevated to a module level function.

>>> next(segment_counter) # Remainder is also returned
(9, 10)
"""
def finite(start, stop, step):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be elevated to a module level function.

@to_list
def block_hashes_in_range(web3, block_range):
for block_number in range(block_range[0], block_range[1] + 1):
yield web3.eth.getBlock(block_number).hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that web3.eth.getBlock returns None if block_number is greater than the current HEAD block number, which means this will raise an AttributeError if that case is met.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, I'll handle that case.

if method == 'eth_newFilter':
_filter = RequestLogs(web3, **apply_key_map(FILTER_PARAMS_KEY_MAP, params[0]))

if method == 'eth_newBlockFilter':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/if/elif and probably good to add an else that raises NotImplimented.

))


@pytest.fixture(scope='function')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not highly opposed to this but "function" is the default scope for pytest.fixture

iter_block_number.send(1)

with pytest.raises(NotImplementedError):
w3.eth.filter('pending')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to it's own test?


def test_local_filter_middleware(w3, iter_block_number):
block_filter = w3.eth.filter('latest')
assert block_filter.filter_id == '0x0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of asserting the exact id since this is somewhat testing the implementation, we could have a test that spams a number of different filter creations and asserts that the set of filter_id values that are returned are all strings and are unique.

@dylanjw dylanjw force-pushed the filtering-middleware branch 2 times, most recently from b08c278 to e5ff02b Compare April 17, 2018 00:29
@dylanjw
Copy link
Contributor Author

dylanjw commented Apr 17, 2018

We could potentially fuzz test this against an eth-tester backed filter via two separate web3 instances (one eth-tester and one middleware backed) to verify that they return the correct log entries.

@pipermerriam I added a parametrized web3 fixture to tests/core/filtering/conftest.py that runs all the normal filtering tests against the middleware filter and the node filter. Is that what you mean? Or do you want tests with automated inputs? Im not too experienced with fuzz testing.

@pipermerriam
Copy link
Member

pipermerriam commented Apr 17, 2018

I added a parametrized web3 fixture to tests/core/filtering/conftest.py

Looking at the tests that live under tests/core/filtering I'm not sure we're testing the things I'd like to test. The tests in there look to be more smoke tests that things that really provide coverage for the filter APIs.

Ideally, I think we need a test or set of tests that cover the following.

  • Filtering across a range of ~20 or more blocks to see that it handles stepping across larger block ranges appropriately.
  • Variants that are open ended on the toBlock side to see that it correctly handles new block mining.

@dylanjw
Copy link
Contributor Author

dylanjw commented May 16, 2018

I am including tests for the issues mentioned in #836, as well as longer and open ended block ranges.

@dylanjw
Copy link
Contributor Author

dylanjw commented Sep 21, 2018

Check for missing or overlapping blocks in the results. (assumptions were made)

Im getting a heisenbug, where in some cases all tests pass, and in others there are missing log results. Im guessing this could be due to skipping blocks in the ranged block getLogs calls.

@dylanjw
Copy link
Contributor Author

dylanjw commented Sep 23, 2018

I added a test that emits an event to 100 blocks in a row, and show that the a "local" filter finds every one. Doesnt add up with my theory that there are breaks in the eth_getLogs block ranges.

@dylanjw
Copy link
Contributor Author

dylanjw commented Sep 24, 2018

Not ready to merge. Test results are still flaky.

@dylanjw
Copy link
Contributor Author

dylanjw commented Sep 24, 2018

Function scoped pytest fixtures are not respected by Hypothesis, so the failures have to do with logs in the overlapping blocks between runs, where the chain state is not reset. Im surprised this hasnt been a bigger issue with flaky tests.

@carver
Copy link
Collaborator

carver commented Sep 25, 2018

Wow, I can confirm that only hypothesis fails here, reusing integer 2:

import pytest
import itertools

from hypothesis import given, strategies as st

counter = itertools.count()
hypothesis_already_seen = set()
parametrized_already_seen = set()


@pytest.fixture
def next_int():
    return next(counter)


@pytest.mark.parametrize('ignored_text', ('', 'a'))
def test_parametrize_and_fixtures(next_int, ignored_text):
    assert next_int not in parametrized_already_seen
    parametrized_already_seen.add(next_int)


@given(st.text())
def test_hypothesis_and_fixtures(next_int, ignored_text):
    assert next_int not in hypothesis_already_seen
    hypothesis_already_seen.add(next_int)

I would consider this a hypothesis bug. Do you want to open it? If not, I'm happy to.

@carver
Copy link
Collaborator

carver commented Sep 25, 2018

Nevermind, it's already there, and old... 😢 HypothesisWorks/hypothesis#377

@carver
Copy link
Collaborator

carver commented Sep 25, 2018

There is a workaround, so at least you can check locally if the scoping issue is the only problem you're seeing: https://github.com/untitaker/pytest-subtesthack/

@dylanjw
Copy link
Contributor Author

dylanjw commented Sep 25, 2018

I have gotten around the issue by taking a snapshot and restoring at the beginning of the test function. The issue with hypothesis actually uncovered a bug, where the local filters were picking up events from the last block of the previous hypothesis run.

@dylanjw
Copy link
Contributor Author

dylanjw commented Sep 26, 2018

Im not getting flaky test runs anymore. @carver @pipermerriam Ready for final review.

@pipermerriam
Copy link
Member

@carver assigned to you, since this was a concept I came up with I think it's valuable for you to do primary review (cause bias...), I can do another pass when you are done if you think it will help.

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few things, but then it looks good to go!

range(start, stop - step, step),
range(start + step, stop, step)):
range(start, stop - step + 1, step),
range(start + step, stop + 1, step)):
yield segment
else:
Copy link
Collaborator

@carver carver Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be in an else clause right? Seems like it should always run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a for/else loop, so it runs at the end of the for loop. It would work the same to remove the else clause, but probably safer to keep it in case anything is added to this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for/else is typically reserved for loops that have a break somewhere in them, in order to only run the else in whatever case the break does not get triggered (or inversely, having the ability to break, thus skipping the else code.

with pytest.raises(expected):
block_ranges(start, stop)
else:
for actual, expected in zip(block_ranges(start, stop), expected):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause problems if block_ranges returns a different number of pairs than expected. zip() will stop short, and could give a green test on a broken implementation.

One option is:

missing_flag = object()
comparing_pairs = itertools.zip_longest(block_ranges(start, stop), expected, fillvalue=missing_flag)
for actual, expected in comparing_pairs:
      assert actual == expected

Alternatively, you could capture them into a tuple first and check their length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-check on length gives an early exit and is probably preferrable to complicating the comparison loop with an abnormal exit condition which is that implicitely, nothing will evaluate equal to missing_flag.

actual_tuple = next(latest_block_ranges)
except StopIteration as e:
if e.value:
actual_tuple = e.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter_latest_block_ranges doesn't seem to ever return anything. What is the goal here? Since StopIteration only seems to be raised in an error case, maybe it shouldn't be caught at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some bad naming here. iter_block_number is a fixture that allows me to set what web3.eth.blockNumber returns. iter_latest_block_ranges is what is being tested, and sets up a generator latest_block_ranges that has its output assigned on line 115 and checked on line 120. The StopIteration catch is leftover from an older version of iter_latest_block_ranges when I was incorrectly returning the final value, instead of yield. I will take that out.

return (
(from_block, to_block - 1)
for from_block, to_block
in segment_count(start_block, last_block + 1, step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value last_block could be None which blows up here.

Looking at the invocation, it looks like it's always defined, and never None. So maybe just drop the None default value in the signature, and you can drop the last_block is not None check above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took out the None handling from this function and segment_count

elif self._to_block == 'latest':
to_block = self.w3.eth.blockNumber

else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I'm not generally a fan of this gap before the else or elif statement. Up to you, though.

self.to_block,
self.address,
self.topics,
max_blocks=50)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably belongs in a constant with the one from line 235 also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants