Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving formatting into a delegate #1913

Open
hgrecco opened this issue Jan 4, 2024 · 56 comments
Open

Moving formatting into a delegate #1913

hgrecco opened this issue Jan 4, 2024 · 56 comments

Comments

@hgrecco
Copy link
Owner

hgrecco commented Jan 4, 2024

The purpose of this issue is to have a central place to discuss everything related to formatting in 0.24

In the next version of Pint, all formatting operations are going to be moved into a Delegate. As expressed in #1895"

string formatting is becoming more and more complex as users request different (reasonable) means to represent quantities (both magnitude and units). Right now, formatting is intertwined with the registry. Adding more configuration settings increases the registry API surface and makes it harder to maintain and upgrade. The plan is to create a Formatter Delegate. All formatting settings will work directly on the delegate (for a yet unspecified number of versions, we will map this to existing registry functions for backward compatibility.). Also, users will be able to build their own formatter and replace the existing one.

Part of this work has been merged into master, and the rest is being tested in develop.

A few important things:

  1. Pint version 0.24 will be backwards compatible in relation to formatting.
  2. Old expected behavior should not change.
  3. New behavior might be introduced. e.g. Implement sorting units by dimension order #1864
  4. Clear bugs can be fixed. e.g. format_babel doesn't use comma seperator from babel locale #1849, Format doesn't respect width #1798
  5. Existing functions/properties have been rewriting to point to the new API (and marked deprecated).

Please add other issues, or PRs

As of 9e0789c

  1. All test are passing (except doc building).
  2. Docs are not building. (maybe due to 3)
  3. The separate_format_defaults is not implemented.

Please provide your insights or comments about the API and overall organization.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 4, 2024

@keewis I would really like your feedback on this, particularly into the separate_format_defaults.

@keewis
Copy link
Contributor

keewis commented Jan 9, 2024

I think I need to understand your plans a bit better to be able to comment on them. In particular, I'd like to understand how you imagine the delegates to be used (the implementation I can slowly try to understand once that's done), and how this impacts the use of the register_unit_format decorator. (As an aside, I like to think that starting with the way it is used helps a lot with API design).

I'd also recommend thinking about the different ways to customize the format string now: this would make it much easier to make room for features like the "custom modifiers" I had a PR for (which unfortunately is stale now).

The separate_format_defaults was a way to gracefully transition from one way of interpreting the spec to a different one, and as with any deprecation it would be possible to remove it at some point. So I wouldn't worry about that for now, we can put it back later if we think it would break too many people's code.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 9, 2024

My idea is to start pushing composition over inheritance as much as I can in Pint. (I also would like to discuss the idea of adopting protocols, but that is another discussion!)

For the end users, there would not be much of a change. In the next versions, no change at all. Later, certain configurations are going to be within ureg.formatter (i.e. ureg.set_fmt_locale -> ureg.formatter.set_locale).

For the advanced users, it would allow them to swap the formatter for their own by anything that implements format_quantity and format_unit (and maybe format_measurement and maybe format_magnitude). See here This allows them to implement the desired format in an isolated way.

class MyFormatter:
    """Here goes the code
    """ 

ureg.formatter = MyFormatter()

For pint developers, it allows them to better organize the code but having all formatting things in a single place. Also, it could simplify testing because we could in certain tests replace the more complex formatter by something simpler.

We can make this work with register_unit_format by hooking this decorator with the default Formatter object.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 9, 2024

By the way @MichaelTiemannOSC was also trying to implement certain formatting improvements that I think should be easier to implement with this delegate.

@keewis
Copy link
Contributor

keewis commented Jan 10, 2024

If I understand correctly, then, you're planning to move (and refactor) the code that is currently in the registry to the default formatter object. With the effect that instead of hard-coding the format parsing and the call of the appropriate formatting function, you'd even be able to customize those?

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 10, 2024

If I understand correctly, then, you're planning to move (and refactor) the code that is currently in the registry to the default formatter object.

Exactly, this is partially done in develop. The Formatting facet is gone, and the formatter delegate is in place.

If you desire a specific output format, you can simply exchange the formatter object easily by a custom without creating a new formatting string.

If you want to create a new spec, you still can.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 15, 2024

I was testing a few ideas. A few comments:

  1. The idea of the formatter is quite nice. I allow you to a lot of things wihout hacking pint source code. Example:
import pint

from pint.delegates.formatter.base_formatter import BaseFormatter

class NewFormatter(BaseFormatter):

    def format_quantity(self, quantity, spec: str = "") -> str:
          return "Magnitude: {}\nUnits: {}".format(quantity.m, quantity.u)

ureg = pint.UnitRegistry()
ureg.formatter = NewFormatter()

print(str(3 * ureg.meter))

returns

Magnitude: 3
Units: meter
  1. Some units formatters modify not only the units but also the magnitudes. For example, pretty printing will change * by ×.
  2. format_magnitude is a nice addition.

@MichaelTiemannOSC
Copy link
Collaborator

The failure of the docs build is not due to #1864. I think it is due to things changing in Sphinx that break various pins within pint.

The changes to allow sorting by dimension order can easily be overriden in child classes as well (dim_order now being just an instance variable of BaseFormatter).

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 18, 2024

This is just to show some of the progress made with the new formatter. The new organization not only is cleaner, but some things are fixed automatically. For example:

Old version

 	 en_US 	 3.14 meters²
 	 es_ES 	 3.14 metros²
C 	 en_US 	 3.14 meters²
C 	 es_ES 	 3.14 metros²
P 	 en_US 	 3.14 meters²
P 	 es_ES 	 3.14 metros²
L 	 en_US 	 3.14 \mathrm{meter}^{2}
L 	 es_ES 	 3.14 \mathrm{meter}^{2}
Lx 	 en_US 	 3.14 \si[]{\meter\squared}
Lx 	 es_ES 	 3.14 \si[]{\meter\squared}

New version

 	 en_US 	 3.14 meters ** 2
 	 es_ES 	 3,14 metros ** 2
C 	 en_US 	 3.14 meters**2
C 	 es_ES 	 3,14 metros**2
P 	 en_US 	 3.14 meters²
P 	 es_ES 	 3,14 metros²
L 	 en_US 	 3.14\ \mathrm{meters}^{2}
L 	 es_ES 	 3,14\ \mathrm{metros}^{2}
Lx 	 en_US 	 3.14\ \si[]{\meters\squared}
Lx 	 es_ES 	 3,14\ \si[]{\metros\squared}
  1. Notice how L and Lx now work properly in relation with unit names (they were not handled well before).
  2. Notice also how all numbers have the right decimal separator!
  3. Another thing I noticed after trying different architectures, that locale awareness cannot be just and addition. If you have a babel and non-babel classes, you end up with too many of them. That is why babel related keywords are now in all methods. Formatters are still very easy to build, the example above will work. babel will still be optional, but locale keywords will be needed. Without babel, number formatting will be affected by locale but not the unit names.

Take a look at develop ... (test are failing but it is shaping nicely)

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 19, 2024

Fewer and fewer tests are failing, but now there is a decision to be made. While I have been trying to make everything backwards compatible, it is becoming hard to keep the same string representation for numbers. As I mentioned before, I am fixing the localized version (that is a change, but a valuable one). The previous behavior was broken.

But there is one more: number of digits.

Are we ok with changing how numbers are formatted?

Until now, numbers were formatted into strings sometimes with str(value) except when a spec was provided (or a numpy array was being formatted). But this leads to unexpected differences in the way numbers are shown. Also, this is not localizable. The way to make numbers localizable with format is using the n type.

But:

>>> str(24.0)
24.0
>>> format(24.0, "n")
24
>>> str(1.234567890)
'1.23456789'
>>> format(1.234567890, "n")
'1.23457'

In other words, using the n format changes the way magnitudes (and quantities) are printed. We could create a function to handle all the cases, but is a pain in the neck. Also, using n requires changing the locale which is not thread safe AFAIK.

Another way would be to use babel format_decimal. On very nice thing about this is that it does not require changing the locale to make it work and therefore is fully thread safe. But AFAIK it uses the Locale Data Markup Language specification and not Python's Format Specification Mini-Language. Users will require to learn yet another thing.

Options:

  1. Use format with n as the default.
    🟢 works for all python numerical types out of the box
    🟢 localizable,
    🟢 builtin, less code to maintain
    🔴 changes current representation
    🔴 not thread safe
    🟢 does not require learning a new mini language
  2. Use format with distinct defaults for float and int
    🟡 works for all numerical python types, with distinct code paths.
    🟢 localizable,
    🔴 requires building a specific function
    🟡 some representations will work, others not (corner cases)
    🔴 not thread safe
    🟢 does not require learning a new mini language
  3. Use babel's format_decimal
    🟢 works for all numerical python types out of the box
    🟢 localizable,
    🟢 in babel, less code to maintain
    🟡 maybe changes current representation
    🟢 not thread safe
    🔴 require learning a new mini language

Another option will be exploring translating python mini language to babel (is something available?). Or use a package that implements a localizable with arguments format if available.

Ps.- @MichaelTiemannOSC I have a great way to sort units in mind. Simple, easy to implement and that will available in all formatters out of the box.

@MichaelTiemannOSC
Copy link
Collaborator

I read through you questions and your recent commits. I don't have the knowledge myself to answer them, but perhaps a release candidate with the option you think is cleanest (#3?) can get enough user feedback to provide more data.

I look forward to seeing the new sorting idea you have in mind!

@dalito
Copy link
Contributor

dalito commented Jan 19, 2024

I am slightly in favor of option 3 but I haven't used babel localization much. So there may be implications that I don't foresee.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 20, 2024

@dalito I think it might be possible to generate UCUM notation! Let's stabilize the API and see.
@MichaelTiemannOSC good idea. I think we sshould release an alpha or beta version and invite people to try it

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 20, 2024

🎊 All test are passing

Disclaimer: I had to changed a few tests

  • Proper localization of number 24.1 in localized fr_FR is 24,1
  • es_AR was changed to the more common es_ES
  • In the empty format vatios por centímetros should be vatios / centímetros
  • everything related to default_format and missing specific unit format was labeled as xfail until the behavior is clear

There is still work to do:

  • docstring
  • documenation
  • clean up and organize code
  • implement sorting
  • solve/decide number related issue (see above)

I will merge this into main tomorrow and continue working.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 20, 2024

Performance

I was a little worried about the performance hit. But, it actually got faster for most cases.

Formatting 3.4 * ureg.meter ** 2 / ureg.second * ureg.kg

locale - fmt develop / 0.23
None - D 0.96
None - D~ 0.83
None - P 0.99
None - P~ 0.84
None - C 0.99
None - C~ 0.84
None - H 0.91
None - H~ 0.86
None - L 1.01
None - L~ 0.93
None - Lx 1.08
None - Lx~ 1.09

Comparing a different locale or numeric modifiers is not fair because 0.23 was not complying with these codes for all cases.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

I started to update the documentation: https://pint.readthedocs.io/en/develop/user/formatting.html

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

Is there a NIST rule for pluralization of compound units?

  1. 2 meter / second
  2. 2 meters / second
  3. 2 meters / seconds

I think the correct way is 2, but I was not able to find this in the NIST handbook.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

@MichaelTiemannOSC you will notice in develop that the formatter function now has a sort_func argument. This takes any function from Iterable[tuple[str, Number]] to Iterable[tuple[str, Number]].

We need to find a good way to expose this. One option is to have an order or sorting or similar configuration flag in the formatter.

ureg.formatter.order = None
ureg.formatter.order = "alpha"
ureg.formatter.order = "ISO80000"
ureg.formatter.order = sort
ureg.formatter.order = some_function_that_I_made

@MichaelTiemannOSC
Copy link
Collaborator

My reading of this section (https://www.nist.gov/pml/special-publication-811/nist-guide-si-chapter-9-rules-and-style-conventions-spelling-unit-names#97) says:

  1. Unit names have different pluralization rules (which follow English grammar) as compared to unit symbols (which are not pluralized, since ms should be read as the symbol for milliseconds not the plural of meters).
  2. When it comes to pluralizing names, the idioms I've learned to pluralize is that the we only pluralize the last unit name (so Newton meters not Newtons meter nor Newton meter) except in the denominator, where we never pluralize (because denominators are presumed to be singular). Thus Newton meters per second not Newton meters per seconds.
  3. If the numerator is clearly singlular (1 meter / second) then we don't say 1 meters / second). But if not, (X meters / second), we might say X meters / second => 1 meter/second when X==1.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

How about this: If the magnitude is larger than 1, then pluralize the last unit in the numerator.

@MichaelTiemannOSC
Copy link
Collaborator

How about this: If the magnitude is larger than 1, then pluralize the last unit in the numerator.

Not quite:

Half a meter per second vs. 0.5 meters per second

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

You are right.

More questions:

  1. Do we always pluralize or just when localizing?
  2. How / or per is selected? By format type (e.g. pretty print uses per but default uses /) or if localization occurs?
  3. Does pluralization works the same in 3 meter / second than in 3 meter second^-1?

@MichaelTiemannOSC
Copy link
Collaborator

You are right.

More questions:

  1. Do we always pluralize or just when localizing?
  2. How / or per is selected? By format type (e.g. pretty print uses per but default uses /) or if localization occurs?
  3. Does pluralization works the same in 3 meter / second than in 3 meter second^-1?

My advice on (1) is to let the respective open source localization experts answer for their locale. I understand that it becomes a political question to ask "when localizing?" because in some sense, any standard must be read within the context of some locale. But I think we'll get feedback fairly quickly on when and how pluralization rules apply in the default case vs. various locales. As long as we have the pluralization infrastructure we can implement what's needed.

On (2), NIST gives this rule for / vs. per: https://www.nist.gov/pml/special-publication-811/nist-guide-si-chapter-9-rules-and-style-conventions-spelling-unit-names#98

On (3), NIST says that both are degenerate. Reflecting on my own idiomatic interpretation, I would say that mathematical formulation creates an entity syntactically distinct from the quantity. That syntactic entity, which is mathematical in nature (meter / second or meter second^-1), does not take plural form. But in the case where the mathematical form is identical to natural english (e.g., meter by itself), then we treat that as part of a spoken rather than formulaic construction, and we say 3 meters rather than 3 meter. When we use per instead of /, it's a spoken expression, not a mathematical one, so English idioms apply: 3 meters per second.

@dalito
Copy link
Contributor

dalito commented Jan 22, 2024

I would also say "just when localizing". German has interestingly no plural for units.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

In relation to 1, could be something like this

When locale is None: English number formatting and canonical unit names
When locale is not None: Locale number formatting and translated unit names, pluralizing when appropriate (TBD)

>>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = 2.3 * ureg.meter / ureg.second
>>> q
<Quantity(2.3, 'meter / second')>
>>> ureg.formatter.set_locale(None)
>>> str(q)
'2.3 meter / second'
>>> ureg.formatter.set_locale("en_US")
>>> str(q)
'2.3 meters / second'
>>> ureg.formatter.set_locale("de_DE")
>>> str(q)
'2,3 Meter / Sekunden'

@MichaelTiemannOSC
Copy link
Collaborator

MichaelTiemannOSC commented Jan 22, 2024

"Pluralizing where appropriate" meaning '2.3 meters per second' in en_US.

@dalito
Copy link
Contributor

dalito commented Jan 22, 2024

The correct German version would be '2,3 Meter / Sekunde' (without plural-n at the end).

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

The correct German version would be '2,3 Meter / Sekunde' (without plural-n at the end).

Sorry about that!

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

"Pluralizing where appropriate" meaning '2.3 meters per second' in en_US.

This what I am not sure. I am pretty sure that in locale is None, / is the right thing. i.e. I think that locale is None is like the definition file: nglish number formatting, canonical unit names and standard math ops.

But do we make per dependent on the formatting code or always true for locale is not None

For example, could be that

>>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = 2.3 * ureg.meter / ureg.second
>>> q
<Quantity(2.3, 'meter / second')>
>>> ureg.formatter.set_locale("en_US")
>>> "{q:D}" # default
'2.3 meters / second'
>>> "{q:P}" # pretty
>>> str(q)
'2.3 meters per second'

@dalito
Copy link
Contributor

dalito commented Jan 22, 2024

But thinking about it some units are pluralized, time units for example when in nominator, e.g. 1 Sekunde but 2 Sekunden. Hopefully babel knows the rules. I don't know them just the language.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

(sorry for editing your comments, I press the wrong icon!)

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

But thinking about it some units are pluralized, time units for example when in nominator, e.g. 1 Sekunde but 2 Sekunden. Hopefully babel knows the rules.

I looked at babel and it kind of knows, but not quite. It also does not uses Pythons' string formatting mini language. Actually took part of its code and adapt it. But I can look more into it.

@dalito
Copy link
Contributor

dalito commented Jan 22, 2024

Don't worry! Getting German right should not hold this PR.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

But thinking about it some units are pluralized, time units for example when in nominator, e.g. 1 Sekunde but 2 Sekunden. Hopefully babel knows the rules.

I looked at babel and it kind of knows, but not quite. It also does not uses Pythons' string formatting mini language. Actually took part of its code and adapt it. But I can look more into it.

In simple terms:

  1. Works really well for single units
  2. Works well for some compound units (unit A / unit B)
  3. Works very bad for all other compound units

That is why what I have done is pick the single unit translator and use it to build a compound translation.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

@MichaelTiemannOSC you will notice in develop that the formatter function now has a sort_func argument. This takes any function from Iterable[tuple[str, Number]] to Iterable[tuple[str, Number]].

We need to find a good way to expose this. One option is to have an order or sorting or similar configuration flag in the formatter.

ureg.formatter.order = None
ureg.formatter.order = "alpha"
ureg.formatter.order = "ISO80000"
ureg.formatter.order = sort
ureg.formatter.order = some_function_that_I_made

I just realized that this will not work. The high level API is ok (we need to bikeshed naming, etc). But internally wont work. Now, pint does the following things.

  1. If "~" is present, generate the short form version of the units
  2. If locale is present, translate units (includes pluralization)
  3. If sorting is requested, sort
  4. Build the unit string

This is fine, as long as pluralization works on all units or no units. But it has to work in only one determined by its position (for example in the last unit of the numerator if there are multiple units), then this will not work and localization will need to be done in two steps

  1. If "~" is present, generate the short form version of the units
  2. If locale is present, translate units (includes pluralization)
  3. If sorting is requested, sort
  4. If locale is present, pluralize according to rule.
  5. Build the unit string

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 22, 2024

All test are passing (including doc building and doctest). I am merging this into master (even if we still need to decide about certain things!)

@MichaelTiemannOSC
Copy link
Collaborator

That's great news. I have once again confused myself and/or my own fork of Pint due to my limited understanding of Git. I will try to sort that out and get a fresh PR against the latest master. Git is my friend 99% of the time, and then 1% of the time it just stabs me in the back. Again, and again, and again.

@andrewgsavage
Copy link
Collaborator

came accross this module which may be worth supporting for formatting in future https://sciform.readthedocs.io/en/stable/

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 23, 2024

That's great news. I have once again confused myself and/or my own fork of Pint due to my limited understanding of Git. I will try to sort that out and get a fresh PR against the latest master. Git is my friend 99% of the time, and then 1% of the time it just stabs me in the back. Again, and again, and again.

Been there. I feel your pain!

@MichaelTiemannOSC
Copy link
Collaborator

I'm looking to unify my #1841 test case with some of the new patterns I see in the latest pint test suite:

def test_issues_1841(subtests):
    import pint

    ur = UnitRegistry()

    for x, spec, result in (
            (ur.Unit(UnitsContainer(hour=1,watt=1)), "P~", "W·h"),
            (ur.Unit(UnitsContainer(ampere=1,volt=1)), "P~", "V·A"),
            (ur.Unit(UnitsContainer(meter=1,newton=1)), "P~", "N·m"),
    ):
        with subtests.test(spec):
            ur.default_format = spec
            breakpoint()
            assert f"{x}" == result, f"Failed for {spec}, {result}"

To make this work, we cannot pass sort_func from the point of use ("{x}"), but we could set ur.sort_func.

Recapping: all the GenericSPECIALRegistries inherit from GenericPlainRegistry, which sets self.formatter = pint.delegates.Formatter(). And that starts life as FullFormatter, which has the aforementioned default_format instance variable. FullFormatter manages the delegation to the various SUBformatters based on spec. And the SUBformatters access default_format not via inheritance, but by climbing back up through the registry to registry.formatter.default_format.

Since all comparable units have to belong to the same registry, we could define a similar instance variable default_sort_func that the various SUBformatters consistently use to properly output compound units. Otherwise, I see forcing users to do a lot of work to enable/disable dimensional sorting. I'm going to take a crack at that approach, but feel free to interject if you think I'm missing something.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 23, 2024

In my opinion, Formatters only need to have certain functions: format_X, with X being magnitude, quantity, etc. Being
modeled after format, each should have at least two arguments in addition to self: the actual object and the corresponding spec string.

If somebody writes :

class SecretFormatter:

      def format_quantity(self, quantity, spec):
            return "**********" 

should work. This makes it very easy to replace the formatter for custom cases, prototype ideas, etc.

In my opinion, everything else is an extra and should not be required by the user, nor Pint core (although it should be provided by default Pint Formatter, see below)

This is not the case now, due to backwards compatibility issues (i.e. to able to do ureg.default_format = at the registry level you need to make sure that the default formatter has a default_format attribute. The same with locale. And also due to the format_babel methods.)

I was also not able to make simple architecture in which is:

  1. fast
  2. easy to read
  3. and keep the signature simple as this: format_quantity(self, quantity, spec)

The Formatter shipped by Pint by default should be very close to the current one (FullFormatter):

  1. there is a way to specify a default format (done)
  2. there is a way to specify a default locale and babel length (done)
  3. there is a way to specify a default sorting order (in your PR)

Not sure if the SubFormatters (which are Formatters themselves :-D) should have this as well. Now they do not have a locale or format or babel_length, they get this through arguments. Should we add one more (sort_func)?

@MichaelTiemannOSC
Copy link
Collaborator

As you can see from the PR, quantity (and unit) have pathways to registries, which encodes tremendous power and flexibility. I needed to do some magic to make formatter work with arguments stripped of these powers (because all the arguments are reduced to strings and formatting strings). Feel free to comment on the lines of code within the PR about what makes you happy and what makes you sad.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 23, 2024

@MichaelTiemannOSC The PR looks great. Indeed, certain features require access to the registry (the same is true for "~" spec which requires a short form of the unit). I think this is unavoidable. I was thinking to have a function that works as a guard and can be reused.

def get_required_registry(obj):
    try:
         return obj._REGISTRY
    except Exception:
         raise ValueError( < a nicely written error msg >")

I thought about this a lot. Maybe we should make the formatter only available to registry bounded objects ... but in certain cases you want to format an unbounded object and is fine.

@MichaelTiemannOSC
Copy link
Collaborator

If the formatter took an optional registry parameter that would make this PR much simpler. I'll work that up to show you.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 23, 2024

If the formatter took an optional registry parameter that would make this PR much simpler. I'll work that up to show you.

The problem is doing it at the registry level is that then every Formatter needs to implement this option. That is why I was promotting somethign along the lines of

ureg.formatter.sort_func = XXX

@MichaelTiemannOSC
Copy link
Collaborator

If the formatter took an optional registry parameter that would make this PR much simpler. I'll work that up to show you.

The problem is doing it at the registry level is that then every Formatter needs to implement this option. That is why I was promotting somethign along the lines of

ureg.formatter.sort_func = XXX

OK...so related to that, since all units must come from the same registry, we can read the registry from the first item that has a meaningful registry (dimensionless might not...and it might not matter if it's the only "dimension"). I'll see if there's something I can still simplify around that idea.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 23, 2024

You are right, if we sort there we have no registry. Take a look at format_compound_unit (

def format_compound_unit(
) Maybe we could figure out together a better way to achieve this remembering that we must do:

  1. If "~" is present, generate the short form version of the units
  2. If locale is present, translate units (includes pluralization)
  3. If sorting is requested, sort
  4. If locale is present, pluralize according to rule.
  5. Build the unit string

I decided to split it but maybe it was the wrong call and we should put 1 to 4 together.

@keewis
Copy link
Contributor

keewis commented Jan 23, 2024

I'd probably combine 2 and 4 together (i.e. apply locale), which would be mutually exclusive to 1. With that we can first sort (unless you need to sort alphabetically by the unit instead of its dimensions?), format the individual units and finally put together the full string:

  1. sort if requested
  2. format units
  • either: construct short form version of the units
  • or: translate units and pluralize according to the locale (might need to have knowledge about the full unit expression)
  1. build the unit string

@MichaelTiemannOSC
Copy link
Collaborator

I've merged sorting into format_compound_unit after translation (so that sorting by alpha works in the target language). I think it really belongs there: #1926.

What I don't understand is why doc build is now failing...shouldn't there be an easy way for me to see that I should merge from upstream if I'm now out-of-date? Anyways...comments welcome.

@hgrecco
Copy link
Owner Author

hgrecco commented Mar 11, 2024

I just pushed to develop a new version of the formatter delegate. The main advantage is that now pluralization of localized units works as expected in an easy-to-understand workflow.

It also incorporates the ability to sort units. The current sorting options are: None, by show name (i.e. localized), by unit name (non localized) and by dimensions.

Not sure how this should be handled from the user side. Options are:

  1. defined by the formatting code "D", "C", etc ...
  2. configurable
    2.a. configurable via formatter attribute
    2.b. extra argument to (some) Formatter function + default 2a
    2.c. new formatting codes (e.g. $n, $s, $u, $d) + default 2a

I still need to work to speed up the code.

@hgrecco
Copy link
Owner Author

hgrecco commented Mar 14, 2024

What are your thoughts on this @MichaelTiemannOSC @keewis @dalito @andrewgsavage ?

@andrewgsavage
Copy link
Collaborator

It needs to be distinct from the formatting code as you may wish for a different order to the sorted version.

My preference is for a configurable default option, that can be overwridden when needed (ie when units don't come out in the order you want). So your option 2c

I think it is worth adding a 'reversed' option too; that would be the most obvious way to reorder 2 units when they're not in order. Maybe this could be a modifier?
Could also do something for 3 units, eg for units a, b, c the user would have an ordering they want. so have a character (a-e below) in the formatting string to get to the desired order:

a -> b, c, a
b -> c, a, b
c -> b, a, c
d -> c, b, a
e -> a, c, b

@andrewgsavage
Copy link
Collaborator

another option could be to have a list of common 'unit pairs' that should be kept together, eg VA, Wh

maybe it would need to be dimensionality pairs to also catch Ws

@atravert
Copy link

atravert commented Jul 23, 2024

Hello,

Can you let me know whether it is still possible to change the behaviour of default formatters in pint 0.24 ?

in previous versions we did so like this:

from pint import formatting

del formatting._FORMATTERS["P"]

@formatting.register_unit_format("P")
def format_pretty(unit, registry, **options):
 return formatting.formatter(
     unit.items(),
     as_ratio=False,   # default is True
     single_denominator=False,
     product_fmt=".",
     division_fmt="/",
     power_fmt="{}{}",
     parentheses_fmt="({})",
      exp_call=formatting._pretty_fmt_exponent,
     )

in pint 0.24 this does not work anymore. I can still apparently remove the default formatter :
del formatting.REGISTERED_FORMATTERS["P"]

but nothing changes whatever custom format I put in register_unit_format("P")...

@keewis
Copy link
Contributor

keewis commented Jul 28, 2024

that's because the default formatter delegate, FullFormatter, has its own registry of formatters:

self._formatters = {}
self._formatters["raw"] = RawFormatter(registry)
self._formatters["D"] = DefaultFormatter(registry)
self._formatters["H"] = HTMLFormatter(registry)
self._formatters["P"] = PrettyFormatter(registry)
self._formatters["Lx"] = SIunitxFormatter(registry)
self._formatters["L"] = LatexFormatter(registry)
self._formatters["C"] = CompactFormatter(registry)
. To modify existing behavior, you'd have to remove that particular formatter, as well:

import pint
from pint import formatting

del formatting.REGISTERED_FORMATTERS["P"]

@pint.register_unit_format("P")
def format_pretty(unit, registry, **options):
    return formatting.formatter(
        unit.items(),
        as_ratio=False,   # default is True
        single_denominator=False,
        product_fmt=".",
        division_fmt="/",
        power_fmt="{}{}",
        parentheses_fmt="({})",
        exp_call=formatting._pretty_fmt_exponent,
    )

ureg = pint.UnitRegistry()
del ureg.formatter._formatters["P"]
q = ureg.Quantity(1, "m")
f"{q:~P}"

Not sure if this is a supported use case, though.

@atravert
Copy link

Thank you very much for your reply @keewis. It works, but indeed a dirty hack, which causes later problems in my case (because ureg.formatter._formatters[“P”] is permanently deleted).

So, as suggested in the documentation, I have subclassed the default formatters as follows:

from pint import UnitRegistry
from pint.delegates.formatter.plain import PrettyFormatter
from pint.delegates.formatter.full import FullFormatter
from pint.delegates.formatter._compound_unit_helpers import prepare_compount_unit
from pint.delegates.formatter._format_helpers import formatter, pretty_fmt_exponent



class MyPrettyFormatter(PrettyFormatter):

    def format_unit(self, unit, uspec, sort_func, **babel_kwds) -> str:
        numerator, denominator = prepare_compount_unit(
            unit,
            uspec,
            sort_func=sort_func,
            **babel_kwds,
            registry=self._registry,
        )

        return formatter(
            numerator,
            denominator,
            as_ratio=False,
            single_denominator=False,
            product_fmt=" ",
            division_fmt="/",
            power_fmt="{}{}",
            parentheses_fmt=r"({})",
            exp_call=pretty_fmt_exponent
        )


class MyFullFormatter(FullFormatter):

    default_format: str = "~P"

    def __init__(self, registry: UnitRegistry | None = None):
        super().__init__(registry)

        self._formatters = {}
        self._formatters["P"] = MyPrettyFormatter(registry)


ur = UnitRegistry()
ur.formatter = MyFullFormatter(ur)
ur.formatter._registry = ur


q = 1 * ur.meter ** 2 / ur.second

assert f"{q}" == "1.0 m² s⁻¹"
assert f"{q: ~D}" == ' 1.0 m ** 2 / s'

and this solved my problem. Thank you again for your fast reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants