Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow some graph and digraph methods to take iterables/generators #1292

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jamestwebber
Copy link

@jamestwebber jamestwebber commented Oct 8, 2024

I was using the library and found myself wanting to add nodes and edges from a generator, e.g. g.add_edges_from((i,j,w) for i in ...). It feels wasteful, and just unpythonic/unusual, to have to create a list for that type of thing. So I thought I would try modifying the code to make this possible, and it wasn't too bad.

I tweaked the following methods to take an Iterable rather than a Sequence, which allows the use of generators, including things like zip and map:

  • PyGraph.add_nodes_from
  • PyGraph.add_edges_from
  • PyGraph.add_edges_from_no_data
  • PyGraph.extend_from_edge_list
  • PyGraph.extend_from_weighted_edge_list
  • all of these again, for PyDiGraph

I didn't change the name of the argument, although obj_list isn't really accurate anymore. I figured it might desirable to keep it the same for compatibility, although it's a positional-only argument it doesn't matter too much?

  • I ran rustfmt locally
  • I have added the tests to cover my changes.
  • I have updated the documentation accordingly. (only the docstrings right now)
  • I have read the CONTRIBUTING document.

If this change is desired, I'll add some tests and update the documentation. I'm also happy to accept advice on how to implement this better--there might be a better way to write the signature such that pyo3 does more of the work for me, but I didn't see it.

@CLAassistant
Copy link

CLAassistant commented Oct 8, 2024

CLA assistant check
All committers have signed the CLA.

@jamestwebber jamestwebber changed the title Allow some graph and digraph methods can to iterables/generators Allow some graph and digraph methods to take iterables/generators Oct 8, 2024
@coveralls
Copy link

coveralls commented Oct 8, 2024

Pull Request Test Coverage Report for Build 11246526544

Details

  • 61 of 61 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.002%) to 95.818%

Totals Coverage Status
Change from base Build 11172805539: 0.002%
Covered Lines: 18009
Relevant Lines: 18795

💛 - Coveralls

@jamestwebber
Copy link
Author

For consistency I made the same modification to remove_nodes_from and remove_edges_from.

I tweaked the docstrings maybe not enough--they still say param list in them.

@IvanIsCoding
Copy link
Collaborator

From an ergonomics point of view, this PR is very welcome!

From an implementation point of view… I think we need to see if there are performance regressions. It might be faster, it might be slower. We’ll have to check

@jamestwebber
Copy link
Author

Makes sense to do more comprehensive testing for performance. In my one-off test*, it looks like this version is the same performance when given an existing list, and using a generator is considerably faster (~2/3 of the time) than instantiating a temporary list.

* comparing the time of g.extend_from_edge_list([(i,j) for i in range(1000) for j in range(1000)]) to the generator equivalent.

@jamestwebber jamestwebber marked this pull request as ready for review October 9, 2024 01:55
@IvanIsCoding
Copy link
Collaborator

Makes sense to do more comprehensive testing for performance. In my one-off test*, it looks like this version is the same performance when given an existing list, and using a generator is considerably faster (~2/3 of the time) than instantiating a temporary list.

  • comparing the time of g.extend_from_edge_list([(i,j) for i in range(1000) for j in range(1000)]) to the generator equivalent.

That is a good starting point, I'd say even timeit.timeit with a Jupyter notebook get us going. I just want to avoid a situation like #1090 over again

One thing we have to make sure is separating the time Python take to create [(i,j) for i in range(1000) for j in range(1000)] from the benchmark, there might be a syntax error but it looks roughly like this:

import rustworkx as rx
import timeit

# create this outside the loop
data = [(i,j) for i in range(1000) for j in range(1000)]

timeit.timeit('g = rx.Graph(); g.extend_from_edge_list(data)', number=100)

If you move the data creation to inside the benchmark, you're checking how much time Python takes to build a list which for some of our users is inevitable if it is an API response or reading from a file.

let mut out_list: Vec<usize> = Vec::with_capacity(obj_list.len());
for obj in obj_list {
let mut out_list: Vec<usize> = Vec::new();
for py_obj in obj_list.iter()? {
Copy link
Collaborator

@IvanIsCoding IvanIsCoding Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main other alternatives I want to test are:

@jamestwebber
Copy link
Author

If you move the data creation to inside the benchmark, you're checking how much time Python takes to build a list which for some of our users is inevitable if it is an API response or reading from a file.

Ah yes, I should have been clearer about the testing (I didn't realize your "we need to check for performance" comment included me 😅). I realized I didn't have matching python versions so I had to do it again anyway.

I looked at a few versions with timeit, all preceded with g = rx.PyGraph() or g = rx.PyDiGraph():

  1. g.extend_from_edge_list([(i,j) for i in range(1000) for j in range(1000)])
  2. g.extend_from_edge_list(edge_list) when edge_list is already defined.
  3. g.extend_from_edge_list((i,j) for i in range(1000) for j in range(1000))

For 1 and 2, I tried it with this PR and the current release (0.15.1) on 3.11.6. The results for PyGraph (same thing for the other one):

0.15.1 this PR
1 157 ms ± 3.45 ms 154 ms ± 3.98 ms
2 58.7 ms ± 504 μs 52.1 ms ± 3.18 ms
3 n/a 105 ms ± 2.61 ms

I see some variance on my machine when I do this multiple times (I think from multitasking etc) but it's in this ballpark. So it doesn't look like existing code would be negatively affected, and there's an additional capability that can be more efficient.

@IvanIsCoding
Copy link
Collaborator

Oh either of us could have done the testing, sorry for the miscommunication.

Regardless, the numbers look great! We need to add tests but that should be straightforward, the hard part is covering each endpoint. One tip I will give is using methods like iter() and reversed() to get iterators from objects we already have in the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants