Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method iterRecords_range #309

Closed
wants to merge 3 commits into from
Closed

Conversation

lguez
Copy link
Contributor

@lguez lguez commented Oct 15, 2024

I have tested this by reading about one million record:

#!/usr/bin/env python3

import time

import shapefile

r = shapefile.Reader("extremum")
t0 = time.perf_counter()

for i in range(10, 1000000):
    x = r.record(i)

t1 = time.perf_counter()
print(t1 - t0)
t0 = time.perf_counter()

for x in r.iterRecords_range(10, 1000000):
    pass

t1 = time.perf_counter()
print(t1 - t0)

I get:

$ test_iterRecords_range.py 
21.27043527108617
7.878919951850548

So iterRecords_range is somewhat faster.

Using iterRecords with a range option should be faster than calling
record within a loop, since we avoid the multiple calls to seek.
This reverts commit e41b03c. JamesParrott pointed that I did not understand
the way `__record` works: __record does not use oid to find the correct
record, it just assumes it is the correct oid for the current position.
Using the method `iterRecords_range` should be somewhat faster than
calling the method `record` within a loop, since we avoid the repeated
calls to seek inside `record`.
@JamesParrott
Copy link
Collaborator

Great stuff - thanks Lionel :). This appears to be good from a quick look - I'll review it fully over the next few days and get back to you.

@lguez
Copy link
Contributor Author

lguez commented Oct 16, 2024

OK. Thanks. I have written a separate method because you said you prefered so, but I still think it would make simpler code and it would be clearer to the user to have additional optional arguments to the iterRecords method:

def iterRecords(self, fields=None, start=0, stop=None)

@JamesParrott
Copy link
Collaborator

JamesParrott commented Oct 17, 2024

Yeah, adjusting iterRecords is more sensible if the range iterator is not customisable, just start and stop (in future, step could be added too). It avoids duplication.

How's this look?
lguez/pyshp@master...GeospatialPython:pyshp:combine_iterRecords_range_into_iterRecords

[edit] I added one test, and the branch above passes it (and also passes all the previous ones).

I think "yields the same _Records, as Reader.record" covers the claim and the main contract with the user.

Can you, or anyone, think of anything else that should be tested?

@lguez
Copy link
Contributor Author

lguez commented Oct 18, 2024

No, it seems fine. Thanks.

@JamesParrott
Copy link
Collaborator

A branch has been made from this PR, which will be merged: https://github.com/GeospatialPython/pyshp/tree/combine_iterRecords_range_into_iterRecords

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants