Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pyos][question] How do the operators work? #429

Open
sneakers-the-rat opened this issue Apr 10, 2023 · 2 comments
Open

[pyos][question] How do the operators work? #429

sneakers-the-rat opened this issue Apr 10, 2023 · 2 comments
Labels
discussion Discussing a topic with no specific actions yet

Comments

@sneakers-the-rat
Copy link

I am trying to piece my way through the code and I figured it would probably be easier to just ask.

Perhaps related to : #427

  • Starting from the docs, I see 6 types of operators: unary, binary, monoid, etc.
  • Those seem to be intended to be accessed from top-level namespaces like graphblas.binary
  • Each of the operator type modules has an __init__.py file that seems to be roughly duplicated between operators. They seem to be populated as a side effect of importing from ..core import operator and import .numpy
  • The .numpy module within each operator type seems to be the place where the operators are actually listed along with a mapping between graphblas and numpy
  • That seems to get packed up into an __all__ and maybe from there that is how they are called, but I am not sure how/where that lookup is actually made
  • There also seems to be some step in core.operator.utils that initializes each of the op classes?

I'm probably missing a lot but that's where I have reached so far. I think this would be a really good thing to have in the docs since it also relates to a few other questions eg. how to document each of the different operator types so someone using the package can discover them and how the code is structured generally. A decent amount of this seems to happen in side effects and by dynamically modifying the modules on/after import. It seems to work fine, which is awesome, so i believe y'all understand how this works but it seems like a lot of cognitive overhead for someone who might be interested in being a contributor.

I was thinking it might be a good think to think out loud about, and i'm happy to be a novice reader/fresh eyes/however you want to call it that can help point out where i get lost when trying to follow the thread.

Part of: pyOpenSci/software-submission#81

@eriknw
Copy link
Member

eriknw commented Apr 12, 2023

You seem to be getting the rough idea correct I think. Thanks for thinking out loud!

The .numpy module within each operator type seems to be the place where the operators are actually listed along with a mapping between graphblas and numpy

The .numpy modules use the same names as operators from NumPy. By default, these use builtin GraphBLAS operators if possible and user-defined functions compiled with numba otherwise. These namespaces are meant for conveniences and may be helpful if we were to ever try to follow the Python array API standard.

That seems to get packed up into an all and maybe from there that is how they are called, but I am not sure how/where that lookup is actually made

I'm not sure I follow. What do you mean "that is how they care called"? Calling e.g. UnaryOp._initialize() populates the graphblas.unary and graphblas.unary.numpy namespaces. For UDFs, we delay compiling them (b/c doing so is very slow and would adversely affect import times), which is why modules often have _delayed dict and custom __dir__ and __getattr__ that will correctly handle the delayed objects (as well as give deprecation warnings).

There also seems to be some step in core.operator.utils that initializes each of the op classes?

Yeah, that's currently where things get initialized, which populates the operator namespaces. Perhaps it would be clearer to move the initialization data and methods off of the classes and into e.g. core.operator.initialize (which may need refactored again after we add more backends, such as having a SuiteSparse-specific module for initializing, etc.), which would also perform the initialzation. utils isn't the first place anybody would look to see where things get initialized. I think I like this idea as a way to further simplify the operator classes and code.

Did you read my super-short description from #410 (comment)?

Here's a quick sketch of the important bits: we call e.g. UnaryOp._initialize,
which then does a regex on lib (the objects from cffi that wraps the C implementation)
to find unary operators, then we add all the typed unary operators
(as TypedBuiltinUnaryOp) to UnaryOp, which we save to gb.unary.
We do the same thing for binary operators, monoids, semirings, and indexunaryops.

I may have mentioned this before, but our team discussed this and we would like for data about operators to be declared upstream in suitesparse-graphblas as e.g. JSON or dicts. We would probably add a little more metadata as JSON or dict for additional things we do in python-graphblas. This would be an easy place for us (or an eager contributor) to add docstrings for each operator. I agree with your general sentiment that having things declarative rather than derived from sideeffects is usually clearer and preferred, and it would be especially nice to have more details in the docs website.

Adding module docstrings, class docstrings, code comments, etc. to make the code more navigable and understandable is high on my priority list, because right now I'm uniquely position to do so (I believe the rest of the team could also do so, but I can probably do so better and more completely) and this will help future contributors (or users trying to understand/debug something), which is super-important for long-term viability of the project. I will focus on this and refactoring as appropriate to make things clearer before I focus on high-level documentation, and I'm super-grateful that the rest of the team has done the vast majority of high-level documentation thus far.

Keep your questions and streams-of-thoughts coming! :)

@eriknw eriknw added the discussion Discussing a topic with no specific actions yet label Apr 12, 2023
@sneakers-the-rat
Copy link
Author

OK I see the _initialize method in the individual operators now. following that. Yes I think this would all be useful to have in the docs. A simple block diagram would go a long way here I think, to be able to trace "these are the functions/classes/methods brought up from cffi, they are initialized here, and then they end up in this namespace" - nothing too fancy, just something to guide people who are curious about the implementation and might want to become contributors.

Yes upstream metadata would be nice. I pieced through this myself, wondering how much could be inferred from the cffi imports themselves, and it doesn't look like much - I see why a lot is hardcoded here, it's otherwise impossible to tell which method/function goes with what. getting started reading again and hoping to finish today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing a topic with no specific actions yet
Projects
None yet
Development

No branches or pull requests

2 participants