Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise IterationError on StopIteration #473

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Oct 18, 2019

Resolves #465.

@groutr groutr changed the title Raise RuntimeError on StopIteration [WIP] Raise RuntimeError on StopIteration Oct 18, 2019
@eriknw
Copy link
Member

eriknw commented Nov 2, 2019

Thanks Ryan! I think I'd prefer to create our own IterationError subclassed from RuntimeError. I would also prefer to have an error message on the thrown errors. Once this is in, your Python 3 PR should do raise IterationError("some message") from exc where exc is the original StopIteration exception so we keep around the original cause.

@groutr
Copy link
Contributor Author

groutr commented Nov 4, 2019

@eriknw I'll update the python 3 PR when this gets merged into master.

@groutr groutr changed the title [WIP] Raise RuntimeError on StopIteration [WIP] Raise IterationError on StopIteration Nov 4, 2019
@groutr
Copy link
Contributor Author

groutr commented Jan 6, 2020

@eriknw does this need anything else before it can be merged?

@eriknw
Copy link
Member

eriknw commented Jan 14, 2020

This looks pretty good. As a small nit, I would prefer different error messages. For example, nth(5, [1, 2, 3]) shouldn't say "Received empty sequence", because the sequence passed to nth isn't empty.

@groutr
Copy link
Contributor Author

groutr commented Jan 15, 2020

@eriknw the exception is raised in first because nth created a sequence with no first element. In that case, nth should probably catch the error from first and generate a more informative message.

def nth(n, seq):
    """ The nth element in a sequence

    >>> nth(1, 'ABC')
    'B'
    """
    if isinstance(seq, (tuple, list, Sequence)):
        return seq[n]
    else:
        try:
            return first(itertools.islice(seq, n, None))
        except IterationError:
            raise IterationError("Element {} does not exist".format(n))

With the exception chaining in Python 3, this is handled nicely.

Catches the IterationError from first and produces a more meaningful
exception message.
@eriknw
Copy link
Member

eriknw commented Mar 13, 2020

LGTM!

@groutr
Copy link
Contributor Author

groutr commented Mar 17, 2020

What the errors look like now. @eriknw if this is fine with you, I believe this is also merge-able.

In [2]: itertoolz.first([])
---------------------------------------------------------------------------
IterationError                            Traceback (most recent call last)
<ipython-input-2-6b062b701b06> in <module>
----> 1 itertoolz.first([])

~/projects/toolz/toolz/itertoolz.py in first(seq)
    377     for rv in seq:
    378         return rv
--> 379     raise IterationError("Received empty sequence")
    380
    381

IterationError: Received empty sequence

In [3]: itertoolz.second([1])
---------------------------------------------------------------------------
IterationError                            Traceback (most recent call last)
~/projects/toolz/toolz/itertoolz.py in second(seq)
    388     try:
--> 389         return first(itertools.islice(seq, 1, None))
    390     except IterationError as exc:

~/projects/toolz/toolz/itertoolz.py in first(seq)
    378         return rv
--> 379     raise IterationError("Received empty sequence")
    380

IterationError: Received empty sequence

The above exception was the direct cause of the following exception:

IterationError                            Traceback (most recent call last)
<ipython-input-3-38b92c6791c0> in <module>
----> 1 itertoolz.second([1])

~/projects/toolz/toolz/itertoolz.py in second(seq)
    389         return first(itertools.islice(seq, 1, None))
    390     except IterationError as exc:
--> 391         raise IterationError("Lenth of seq is < 2") from exc
    392
    393

IterationError: Lenth of seq is < 2
In [6]: itertoolz.nth(3, iter([1, 2]))
---------------------------------------------------------------------------
IterationError                            Traceback (most recent call last)
~/projects/toolz/toolz/itertoolz.py in nth(n, seq)
    403         try:
--> 404             return first(itertools.islice(seq, n, None))
    405         except IterationError as exc:

~/projects/toolz/toolz/itertoolz.py in first(seq)
    378         return rv
--> 379     raise IterationError("Received empty sequence")
    380

IterationError: Received empty sequence

The above exception was the direct cause of the following exception:

IterationError                            Traceback (most recent call last)
<ipython-input-6-0efd83b833d7> in <module>
----> 1 itertoolz.nth(3, iter([1, 2]))

~/projects/toolz/toolz/itertoolz.py in nth(n, seq)
    404             return first(itertools.islice(seq, n, None))
    405         except IterationError as exc:
--> 406             raise IterationError("Length of seq is < %d" % n) from exc
    407
    408

IterationError: Length of seq is < 3

@groutr groutr changed the title [WIP] Raise IterationError on StopIteration Raise IterationError on StopIteration Mar 17, 2020
@eriknw
Copy link
Member

eriknw commented Mar 17, 2020

Thanks again.

I don't think it's necessary or informative to chain exceptions. In fact, I may actually prefer a little more verbose by using the for trick instead of using first or next. For example, interpose and peek could use

for _ in it:
    break. # skip the first element in `it`
else:
    raise IterationError()

I suspect this is faster too.

@groutr
Copy link
Contributor Author

groutr commented Mar 17, 2020

@eriknw I've think you've done it again! 👍 Using for loops seems to be faster on my machine. I'll update the PR.

@groutr
Copy link
Contributor Author

groutr commented Mar 17, 2020

Would this mean that second would now be:

def second(seq):
    it = iter(seq)
    for item in it:
        break
    else:
        raise IterationError
    for item in it:
        return item
    else:
        raise IterationError

@eriknw
Copy link
Member

eriknw commented Mar 17, 2020

Yeah, that's a good implementation of second. Here's a small variant (probably about the same speed):

def second(seq):
    it = iter(seq)
    for first_element in it:
        for second_element in it:
            return second_element
        raise IterationError('only 1 in seq blah blah')
    raise IterationError('empty seq blah blah')

toolz/itertoolz.py Outdated Show resolved Hide resolved
@groutr
Copy link
Contributor Author

groutr commented Mar 18, 2020

@eriknw I went with the non-nested implementation for second. While they have very similar performance, the nested version seemed to display a greater variance in performance than the non-nested version on my machine.

@groutr
Copy link
Contributor Author

groutr commented Mar 18, 2020

BTW, there were small performance improvements for most of the functions involved in this PR. 👍 I can post benchmarks if needed.

@groutr
Copy link
Contributor Author

groutr commented Mar 18, 2020

All timing show the original function first followed by the modified function.

first

In [6]: %timeit itertoolz.first([1])
341 ns ± 2.47 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [14]: %timeit itertoolz.first([1])
226 ns ± 1.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

second

In [7]: %timeit itertoolz.second([1, 2])
449 ns ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit itertoolz.second([1, 2])
348 ns ± 4.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

interpose

In [18]: %timeit itertoolz.interpose(None, [1]*50)
1.14 µs ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [33]: %timeit itertoolz.interpose(None, [1]*50)
1.08 µs ± 17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

peek

In [17]: %timeit itertoolz.peek([1, 2, 3, 4])
632 ns ± 16.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [32]: %timeit itertoolz.peek([1, 2, 3, 4])
596 ns ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

peekn

In [14]: %timeit itertoolz.peekn(4, range(100))
1.35 µs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [28]: %timeit itertoolz.peekn(4, range(100))
1.32 µs ± 70.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

nth

In [15]: %timeit itertoolz.nth(15, range(20))
1.61 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [30]: %timeit itertoolz.nth(15, range(20))
1.54 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@eriknw
Copy link
Member

eriknw commented Mar 18, 2020

I think this looks good.

One more thing: should we put IterationError in __init__.py?

@groutr
Copy link
Contributor Author

groutr commented Mar 18, 2020

@eriknw What do you think about putting the exceptions module in __init__.py, similar to curried and sandbox?

I'm working on a cytoolz PR that syncs the changes in this PR.

@eriknw
Copy link
Member

eriknw commented Oct 28, 2021

I think this needs to go in. @groutr, since it's been a while, would you care to take another quick look at this PR and the discussion in #465?

I think to close #465, we may want to add default=no_default argument to these functions as an alternative to raising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StopIteration from first() caught by a generator
2 participants