[pyos][question] How do the operators work? #429

sneakers-the-rat · 2023-04-10T01:39:32Z

I am trying to piece my way through the code and I figured it would probably be easier to just ask.

Perhaps related to : #427

Starting from the docs, I see 6 types of operators: unary, binary, monoid, etc.
Those seem to be intended to be accessed from top-level namespaces like graphblas.binary
Each of the operator type modules has an __init__.py file that seems to be roughly duplicated between operators. They seem to be populated as a side effect of importing from ..core import operator and import .numpy
The .numpy module within each operator type seems to be the place where the operators are actually listed along with a mapping between graphblas and numpy
That seems to get packed up into an __all__ and maybe from there that is how they are called, but I am not sure how/where that lookup is actually made
There also seems to be some step in core.operator.utils that initializes each of the op classes?

I'm probably missing a lot but that's where I have reached so far. I think this would be a really good thing to have in the docs since it also relates to a few other questions eg. how to document each of the different operator types so someone using the package can discover them and how the code is structured generally. A decent amount of this seems to happen in side effects and by dynamically modifying the modules on/after import. It seems to work fine, which is awesome, so i believe y'all understand how this works but it seems like a lot of cognitive overhead for someone who might be interested in being a contributor.

I was thinking it might be a good think to think out loud about, and i'm happy to be a novice reader/fresh eyes/however you want to call it that can help point out where i get lost when trying to follow the thread.

Part of: pyOpenSci/software-submission#81

The text was updated successfully, but these errors were encountered:

eriknw · 2023-04-12T02:32:45Z

You seem to be getting the rough idea correct I think. Thanks for thinking out loud!

The .numpy module within each operator type seems to be the place where the operators are actually listed along with a mapping between graphblas and numpy

The .numpy modules use the same names as operators from NumPy. By default, these use builtin GraphBLAS operators if possible and user-defined functions compiled with numba otherwise. These namespaces are meant for conveniences and may be helpful if we were to ever try to follow the Python array API standard.

That seems to get packed up into an all and maybe from there that is how they are called, but I am not sure how/where that lookup is actually made

I'm not sure I follow. What do you mean "that is how they care called"? Calling e.g. UnaryOp._initialize() populates the graphblas.unary and graphblas.unary.numpy namespaces. For UDFs, we delay compiling them (b/c doing so is very slow and would adversely affect import times), which is why modules often have _delayed dict and custom __dir__ and __getattr__ that will correctly handle the delayed objects (as well as give deprecation warnings).

There also seems to be some step in core.operator.utils that initializes each of the op classes?

Yeah, that's currently where things get initialized, which populates the operator namespaces. Perhaps it would be clearer to move the initialization data and methods off of the classes and into e.g. core.operator.initialize (which may need refactored again after we add more backends, such as having a SuiteSparse-specific module for initializing, etc.), which would also perform the initialzation. utils isn't the first place anybody would look to see where things get initialized. I think I like this idea as a way to further simplify the operator classes and code.

Did you read my super-short description from #410 (comment)?

Here's a quick sketch of the important bits: we call e.g. UnaryOp._initialize,
which then does a regex on lib (the objects from cffi that wraps the C implementation)
to find unary operators, then we add all the typed unary operators
(as TypedBuiltinUnaryOp) to UnaryOp, which we save to gb.unary.
We do the same thing for binary operators, monoids, semirings, and indexunaryops.

I may have mentioned this before, but our team discussed this and we would like for data about operators to be declared upstream in suitesparse-graphblas as e.g. JSON or dicts. We would probably add a little more metadata as JSON or dict for additional things we do in python-graphblas. This would be an easy place for us (or an eager contributor) to add docstrings for each operator. I agree with your general sentiment that having things declarative rather than derived from sideeffects is usually clearer and preferred, and it would be especially nice to have more details in the docs website.

Adding module docstrings, class docstrings, code comments, etc. to make the code more navigable and understandable is high on my priority list, because right now I'm uniquely position to do so (I believe the rest of the team could also do so, but I can probably do so better and more completely) and this will help future contributors (or users trying to understand/debug something), which is super-important for long-term viability of the project. I will focus on this and refactoring as appropriate to make things clearer before I focus on high-level documentation, and I'm super-grateful that the rest of the team has done the vast majority of high-level documentation thus far.

Keep your questions and streams-of-thoughts coming! :)

sneakers-the-rat · 2023-04-17T20:06:54Z

OK I see the _initialize method in the individual operators now. following that. Yes I think this would all be useful to have in the docs. A simple block diagram would go a long way here I think, to be able to trace "these are the functions/classes/methods brought up from cffi, they are initialized here, and then they end up in this namespace" - nothing too fancy, just something to guide people who are curious about the implementation and might want to become contributors.

Yes upstream metadata would be nice. I pieced through this myself, wondering how much could be inferred from the cffi imports themselves, and it doesn't look like much - I see why a lot is hardcoded here, it's otherwise impossible to tell which method/function goes with what. getting started reading again and hoping to finish today

sneakers-the-rat mentioned this issue Apr 10, 2023

Python-graphblas: high-performance sparse linear algebra for scalable graph analytics pyOpenSci/software-submission#81

Closed

24 tasks

eriknw added the discussion Discussing a topic with no specific actions yet label Apr 12, 2023

eriknw mentioned this issue Apr 12, 2023

Split operator initialization into its own module(s) #430

Open

sneakers-the-rat mentioned this issue Jun 27, 2023

[pyos][docs] API Docs missing large amount of user-facing functions/methods/classes #410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pyos][question] How do the operators work? #429

[pyos][question] How do the operators work? #429

sneakers-the-rat commented Apr 10, 2023

eriknw commented Apr 12, 2023 •

edited

Loading

sneakers-the-rat commented Apr 17, 2023

[pyos][question] How do the operators work? #429

[pyos][question] How do the operators work? #429

Comments

sneakers-the-rat commented Apr 10, 2023

eriknw commented Apr 12, 2023 • edited Loading

sneakers-the-rat commented Apr 17, 2023

eriknw commented Apr 12, 2023 •

edited

Loading