Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "dataclasses" #60

Merged
merged 19 commits into from
Oct 23, 2020
Merged

Use "dataclasses" #60

merged 19 commits into from
Oct 23, 2020

Conversation

ziotom78
Copy link
Member

@ziotom78 ziotom78 commented Sep 23, 2020

This PR adds the support for
dataclasses, a
feature of Python 3.7+ that is available on Python 3.6 too thanks to a
backward-compatible package.

This PR uses @dataclass to define the following classes:

  • DetectorInfo (previously Detector)
  • FreqChannelInfo
  • A way to quickly read from a TOML file/dictionary a list of detectors, possibly specifying entire channels at once
  • InstrumentInfo
  • Documentation
  • Agree on a few breaking changes for Instrument/InstrumentInfo (Proposal: a few breaking changes #65)

Using `@dataclass` vastly simplifies the implementation of `Detector`,
and the new name `DetectorInfo` makes the connection with the IMO
quantity clearer.
Apparently, Poetry 1.0 does not handle dependencies on specific
versions of Python correctly:

python-poetry/poetry#1413

While waiting for Poetry 1.1, this commit tries to fix the problem.
@ziotom78
Copy link
Member Author

The classes DetectorInfo and FreqChannelInfo are ready. This PR however should wait the resolution of the following minor issues in litebird_imo before it can be merged:

The reason why these are needed is because the fields in DetectorInfo and FreqChannelInfo refer to the new field names proposed in these issues.

This is a **breaking** change: the `Instrument` class is renamed to
`InstrumentInfo`, to match the names `DetectorInfo` and
`FreqChannelInfo`. For consistency, the class has been moved from
`instruments.py` to `detectors.py`, which already contained the
definition for classes `DetectorInfo` and `FreqChannelInfo`.

Moreover, the fields used by `InstrumentInfo` to hold angles can no
longer be initialized with values in *degrees*: you must explicitly
pass them as radians. This makes the interface to the class clearer:
the parameters you pass to the constructor are the same you can access
once the object has been constructed.

The mock IMO used in tests has been updated to use the name
`instrument_info` instead of `focal_plane`, to be consistent with the
true IMO. A few new data files have been added to this IMO, to test
the ability to produce data files.

All the examples in the documentation have been updated accordingly.
This was missing from the previous commit. It implements a
general-purpose way to produce a list of `DetectorInfo` objects out of
a set of definitions in a dictionary, and it is meant to be used with
TOML files.

A new page about detectors, frequency channels, and instruments has
been added to the documentation.
This might resolve a few problems when installing the
"pillow" dependency under Mac OS X:

python-pillow/Pillow#4242

(the title of the issue is misleading, as the problem
does not affect only Windows user, as it's evident
from the thread.)
@ziotom78
Copy link
Member Author

The PR seems in good shape now. Please have a look at the documentation here:

https://litebird-sim.readthedocs.io/en/dataclasses/detectors.html

It will be merged mid next week, if no concerns arise.

@dpole
Copy link
Member

dpole commented Oct 16, 2020

Hi, I've got two comments.

Starting with the quick one, it seems to me that we should change name to detector.py, even though I don't have a specific suggestion.

Second, we touched the topic elsewhere, but I promised during the last call that I'll drop a line also here. I'm am a bit afraid of the proliferation of classes storing the information of some part of the instrument. As I said in #53 , I prefer bare data: it is less safe, less OO (which is actually a pro to me) but more transparent and minimal (in the sense that it requires the user to get familiar with fewer concepts). I am fine with creating specialized classes for central pieces of information (as in this PR), but, looking more into the future, I am wondering whether we have clear ideas about where to draw the line and how to handle information beyond that line. Otherwise we'll end up with one dataclass per IMO entity.

@ziotom78
Copy link
Member Author

Hi @dpole , a few comments:

  • detectors.py was named similarly to observations.py, simulations.py, quaternions.py, instruments.py, i.e., plurals. If we decide to change it, we must change everything to stay consistent. Do you think it's worth doing this? As far as I can imagine, the user will just write import litebird_sim, and the name of the sub-modules will be rarely used.

  • I share your concerns about OO, but in my experience its biggest pain points are:

    • Heavy use of complex inheritance hierarchies, which we avoid almost completely in our codebase

    • Difficulty to access "private" members from the outside during debugging, which we avoid entirely because Python has no private members

    • Having so many "getter" and "setter" methods in classes hurts eyes. However, so far we have avoided these entirely

    • Boresome definition of constructors that have to initialize all the members one by one, which this PR solves

While I agree that OO can make the codebase more difficult to understand (the old joke about the NotificationStrategyFactory!), I have never seen any data that backs up the claim that just using plain structs makes things more difficult for users. (Yes, I used the word "struct", because the way we are using classes in this codebase is more similar to C/Julia struct types than to C++ classes.) The ease of use depends on the API, not on the fact that we're avoiding the class keyword and the object.member notation (just two concepts to learn).

Finally, I too would like to avoid having too many classes, but my reason is against it is because of the "too many" part of the sentence, not because any OO construct must be avoided at all costs and in any case. I suggest we employ the following rule (abridged from here):

  1. If the fields are always the same during runtime, implement a dataclass;
  2. If the fields can change at runtime depending on the user's input, use a dictionary.

The three classes DetectorInfo, FreqChannelInfo, and InstrumentInfo are meant to be a one-to-one replica of IMO objects, and thus they fall under point 1. For custom data like the parameters field of the Simulation class, we use the dictionary according to point 2.

@dpole
Copy link
Member

dpole commented Oct 16, 2020

I see, thanks. I think we're talking about the same (or close enough) shade of gray from different angles. In particular, I agree that if we use dataclasses as C-like structures no complexity is added. This intention wasn't clear to me. A last question to clarify my practical concern about where to draw the line. Suppose that we add information about the HWP in the IMo with all sort of instrumental properties defined in it. How would you handle it? Is the idea to define a dedicated class?

About my first comment, it wasn't about the plural but rather the fact that there is more than detectors in the module (namely channel and instrument).

Finally...I am a bit disappointed because I didn't get the NotificationStrategyFactory joke. It sounded like one of those nerdy things that make my day ;P

@ziotom78
Copy link
Member Author

Suppose that we add information about the HWP in the IMo with all sort of instrumental properties defined in it. How would you handle it? Is the idea to define a dedicated class?

Yes, indeed: if the object is to be read from the IMO, it means that the set of metadata is "frozen" by a Specification Document, and thus it falls in the 1st category above.

In this way, we can get the information about an HWP using the construct

hwp_info = HwpInfo.from_imo(imo, "/ref/to/hwp")

similarly to what we already do with DetectorInfo.from_imo and similar.

Now I get the point about detectors.py. What about something that underlines their close connection with the IMO? E.g., imo_datatypes.

@dpole
Copy link
Member

dpole commented Oct 16, 2020

Yes, indeed: if the object is to be read from the IMO, it means that the set of metadata is "frozen" by a Specification Document, and thus it falls in the 1st category above.

Maybe I'm overestimating the problem, but I imagine that the IMo will have a considerable number of different entities in the future. We are committing to writing a dataclass replicating the content of the imo specification for every entity. It seems to me a significant burden (maybe unnecessary) on developers.

Beyond the effort issue, maybe it is somehow forbidden by some IMo specification, but I can imagine an entity with many child entities. Walking this hierarchy of entities and calling at each step the proper constructor can lead to quite a bit of complicated code. Having an "anonymous" data container into which any entity may be loaded using a generic function seems a more straightforward and "scalable" solution to me.

@ziotom78
Copy link
Member Author

Maybe I'm overestimating the problem, but I imagine that the IMo will have a considerable number of different entities in the future. We are committing to writing a dataclass replicating the content of the imo specification for the entity. It seems to me an significant burden (maybe unnecessary) on developers.

I do not expect that many details in the IMO will need to be accessed by our framework. (E.g., mechanical structure of the many hardware parts, thermal models, optical models, RF models, circuit diagrams, component datasheets, etc.) We're going to implement dataclasses only for those modules we actually need for the simulation.

Beyond the effort issue, maybe it is somehow forbidden by some IMo specification, but I can imagine an entity with many child entities. Walking this hierarchy of entities and calling at each step the proper constructor can lead to quite a bit of complicated code. Having an "anonymous" data container into which any entity may be loaded using a generic function seems a more straightforward and "scalable" solution to me.

This framework is supposed only to load data files (not entities), and data files, like quantities, have no hierarchy in the current specification of the IMO.

@ziotom78 ziotom78 merged commit e5d3ec9 into master Oct 23, 2020
@ziotom78 ziotom78 deleted the dataclasses branch October 23, 2020 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants