-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving formatting into a delegate #1913
Comments
@keewis I would really like your feedback on this, particularly into the |
I think I need to understand your plans a bit better to be able to comment on them. In particular, I'd like to understand how you imagine the delegates to be used (the implementation I can slowly try to understand once that's done), and how this impacts the use of the I'd also recommend thinking about the different ways to customize the format string now: this would make it much easier to make room for features like the "custom modifiers" I had a PR for (which unfortunately is stale now). The |
My idea is to start pushing composition over inheritance as much as I can in Pint. (I also would like to discuss the idea of adopting protocols, but that is another discussion!) For the end users, there would not be much of a change. In the next versions, no change at all. Later, certain configurations are going to be within For the advanced users, it would allow them to swap the formatter for their own by anything that implements class MyFormatter:
"""Here goes the code
"""
ureg.formatter = MyFormatter() For pint developers, it allows them to better organize the code but having all formatting things in a single place. Also, it could simplify testing because we could in certain tests replace the more complex formatter by something simpler. We can make this work with |
By the way @MichaelTiemannOSC was also trying to implement certain formatting improvements that I think should be easier to implement with this delegate. |
If I understand correctly, then, you're planning to move (and refactor) the code that is currently in the registry to the default formatter object. With the effect that instead of hard-coding the format parsing and the call of the appropriate formatting function, you'd even be able to customize those? |
Exactly, this is partially done in If you desire a specific output format, you can simply exchange the formatter object easily by a custom without creating a new formatting string. If you want to create a new spec, you still can. |
I was testing a few ideas. A few comments:
import pint
from pint.delegates.formatter.base_formatter import BaseFormatter
class NewFormatter(BaseFormatter):
def format_quantity(self, quantity, spec: str = "") -> str:
return "Magnitude: {}\nUnits: {}".format(quantity.m, quantity.u)
ureg = pint.UnitRegistry()
ureg.formatter = NewFormatter()
print(str(3 * ureg.meter)) returns
|
The failure of the docs build is not due to #1864. I think it is due to things changing in Sphinx that break various pins within pint. The changes to allow sorting by dimension order can easily be overriden in child classes as well ( |
This is just to show some of the progress made with the new formatter. The new organization not only is cleaner, but some things are fixed automatically. For example: Old version
New version
Take a look at develop ... (test are failing but it is shaping nicely) |
Fewer and fewer tests are failing, but now there is a decision to be made. While I have been trying to make everything backwards compatible, it is becoming hard to keep the same string representation for numbers. As I mentioned before, I am fixing the localized version (that is a change, but a valuable one). The previous behavior was broken. But there is one more: number of digits. Are we ok with changing how numbers are formatted? Until now, numbers were formatted into strings sometimes with But: >>> str(24.0)
24.0
>>> format(24.0, "n")
24
>>> str(1.234567890)
'1.23456789'
>>> format(1.234567890, "n")
'1.23457' In other words, using the Another way would be to use babel format_decimal. On very nice thing about this is that it does not require changing the locale to make it work and therefore is fully thread safe. But AFAIK it uses the Locale Data Markup Language specification and not Python's Format Specification Mini-Language. Users will require to learn yet another thing. Options:
Another option will be exploring translating python mini language to babel (is something available?). Or use a package that implements a localizable with arguments Ps.- @MichaelTiemannOSC I have a great way to sort units in mind. Simple, easy to implement and that will available in all formatters out of the box. |
I read through you questions and your recent commits. I don't have the knowledge myself to answer them, but perhaps a release candidate with the option you think is cleanest (#3?) can get enough user feedback to provide more data. I look forward to seeing the new sorting idea you have in mind! |
I am slightly in favor of option 3 but I haven't used babel localization much. So there may be implications that I don't foresee. |
@dalito I think it might be possible to generate UCUM notation! Let's stabilize the API and see. |
🎊 All test are passing Disclaimer: I had to changed a few tests
There is still work to do:
I will merge this into main tomorrow and continue working. |
Performance I was a little worried about the performance hit. But, it actually got faster for most cases. Formatting
Comparing a different locale or numeric modifiers is not fair because 0.23 was not complying with these codes for all cases. |
I started to update the documentation: https://pint.readthedocs.io/en/develop/user/formatting.html |
Is there a NIST rule for pluralization of compound units?
I think the correct way is 2, but I was not able to find this in the NIST handbook. |
@MichaelTiemannOSC you will notice in develop that the formatter function now has a We need to find a good way to expose this. One option is to have an order or sorting or similar configuration flag in the formatter. ureg.formatter.order = None
ureg.formatter.order = "alpha"
ureg.formatter.order = "ISO80000"
ureg.formatter.order = sort
ureg.formatter.order = some_function_that_I_made |
My reading of this section (https://www.nist.gov/pml/special-publication-811/nist-guide-si-chapter-9-rules-and-style-conventions-spelling-unit-names#97) says:
|
How about this: If the magnitude is larger than 1, then pluralize the last unit in the numerator. |
Not quite:
|
You are right. More questions:
|
My advice on (1) is to let the respective open source localization experts answer for their locale. I understand that it becomes a political question to ask "when localizing?" because in some sense, any standard must be read within the context of some locale. But I think we'll get feedback fairly quickly on when and how pluralization rules apply in the On (2), NIST gives this rule for On (3), NIST says that both are degenerate. Reflecting on my own idiomatic interpretation, I would say that mathematical formulation creates an entity syntactically distinct from the quantity. That syntactic entity, which is mathematical in nature ( |
I would also say "just when localizing". German has interestingly no plural for units. |
In relation to 1, could be something like this When locale is None: English number formatting and canonical unit names >>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = 2.3 * ureg.meter / ureg.second
>>> q
<Quantity(2.3, 'meter / second')>
>>> ureg.formatter.set_locale(None)
>>> str(q)
'2.3 meter / second'
>>> ureg.formatter.set_locale("en_US")
>>> str(q)
'2.3 meters / second'
>>> ureg.formatter.set_locale("de_DE")
>>> str(q)
'2,3 Meter / Sekunden' |
"Pluralizing where appropriate" meaning |
The correct German version would be |
Sorry about that! |
This what I am not sure. I am pretty sure that in But do we make For example, could be that >>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = 2.3 * ureg.meter / ureg.second
>>> q
<Quantity(2.3, 'meter / second')>
>>> ureg.formatter.set_locale("en_US")
>>> "{q:D}" # default
'2.3 meters / second'
>>> "{q:P}" # pretty
>>> str(q)
'2.3 meters per second' |
But thinking about it some units are pluralized, time units for example when in nominator, e.g. |
(sorry for editing your comments, I press the wrong icon!) |
I looked at babel and it kind of knows, but not quite. It also does not uses Pythons' string formatting mini language. Actually took part of its code and adapt it. But I can look more into it. |
Don't worry! Getting German right should not hold this PR. |
In simple terms:
That is why what I have done is pick the single unit translator and use it to build a compound translation. |
I just realized that this will not work. The high level API is ok (we need to bikeshed naming, etc). But internally wont work. Now, pint does the following things.
This is fine, as long as pluralization works on all units or no units. But it has to work in only one determined by its position (for example in the last unit of the numerator if there are multiple units), then this will not work and localization will need to be done in two steps
|
All test are passing (including doc building and doctest). I am merging this into master (even if we still need to decide about certain things!) |
That's great news. I have once again confused myself and/or my own fork of Pint due to my limited understanding of Git. I will try to sort that out and get a fresh PR against the latest master. Git is my friend 99% of the time, and then 1% of the time it just stabs me in the back. Again, and again, and again. |
came accross this module which may be worth supporting for formatting in future https://sciform.readthedocs.io/en/stable/ |
Been there. I feel your pain! |
I'm looking to unify my #1841 test case with some of the new patterns I see in the latest pint test suite:
To make this work, we cannot pass Recapping: all the GenericSPECIALRegistries inherit from GenericPlainRegistry, which sets Since all comparable units have to belong to the same registry, we could define a similar instance variable |
In my opinion, If somebody writes : class SecretFormatter:
def format_quantity(self, quantity, spec):
return "**********" should work. This makes it very easy to replace the formatter for custom cases, prototype ideas, etc. In my opinion, everything else is an extra and should not be required by the user, nor Pint core (although it should be provided by default Pint Formatter, see below) This is not the case now, due to backwards compatibility issues (i.e. to able to do I was also not able to make simple architecture in which is:
The Formatter shipped by Pint by default should be very close to the current one (FullFormatter):
Not sure if the SubFormatters (which are Formatters themselves :-D) should have this as well. Now they do not have a locale or format or babel_length, they get this through arguments. Should we add one more (sort_func)? |
As you can see from the PR, |
@MichaelTiemannOSC The PR looks great. Indeed, certain features require access to the registry (the same is true for "~" spec which requires a short form of the unit). I think this is unavoidable. I was thinking to have a function that works as a guard and can be reused. def get_required_registry(obj):
try:
return obj._REGISTRY
except Exception:
raise ValueError( < a nicely written error msg >") I thought about this a lot. Maybe we should make the formatter only available to registry bounded objects ... but in certain cases you want to format an unbounded object and is fine. |
If the formatter took an optional registry parameter that would make this PR much simpler. I'll work that up to show you. |
The problem is doing it at the registry level is that then every Formatter needs to implement this option. That is why I was promotting somethign along the lines of ureg.formatter.sort_func = XXX |
OK...so related to that, since all units must come from the same registry, we can read the registry from the first item that has a meaningful registry ( |
You are right, if we sort there we have no registry. Take a look at format_compound_unit (
I decided to split it but maybe it was the wrong call and we should put 1 to 4 together. |
I'd probably combine 2 and 4 together (i.e. apply locale), which would be mutually exclusive to 1. With that we can first sort (unless you need to sort alphabetically by the unit instead of its dimensions?), format the individual units and finally put together the full string:
|
I've merged sorting into What I don't understand is why doc build is now failing...shouldn't there be an easy way for me to see that I should merge from upstream if I'm now out-of-date? Anyways...comments welcome. |
I just pushed to develop a new version of the formatter delegate. The main advantage is that now pluralization of localized units works as expected in an easy-to-understand workflow. It also incorporates the ability to sort units. The current sorting options are: None, by show name (i.e. localized), by unit name (non localized) and by dimensions. Not sure how this should be handled from the user side. Options are:
I still need to work to speed up the code. |
What are your thoughts on this @MichaelTiemannOSC @keewis @dalito @andrewgsavage ? |
It needs to be distinct from the formatting code as you may wish for a different order to the sorted version. My preference is for a configurable default option, that can be overwridden when needed (ie when units don't come out in the order you want). So your option 2c I think it is worth adding a 'reversed' option too; that would be the most obvious way to reorder 2 units when they're not in order. Maybe this could be a modifier?
|
another option could be to have a list of common 'unit pairs' that should be kept together, eg VA, Wh maybe it would need to be |
Hello, Can you let me know whether it is still possible to change the behaviour of default formatters in pint 0.24 ? in previous versions we did so like this: from pint import formatting
del formatting._FORMATTERS["P"]
@formatting.register_unit_format("P")
def format_pretty(unit, registry, **options):
return formatting.formatter(
unit.items(),
as_ratio=False, # default is True
single_denominator=False,
product_fmt=".",
division_fmt="/",
power_fmt="{}{}",
parentheses_fmt="({})",
exp_call=formatting._pretty_fmt_exponent,
) in pint 0.24 this does not work anymore. I can still apparently remove the default formatter : but nothing changes whatever custom format I put in |
that's because the default formatter delegate, pint/pint/delegates/formatter/full.py Lines 72 to 79 in 67303b8
import pint
from pint import formatting
del formatting.REGISTERED_FORMATTERS["P"]
@pint.register_unit_format("P")
def format_pretty(unit, registry, **options):
return formatting.formatter(
unit.items(),
as_ratio=False, # default is True
single_denominator=False,
product_fmt=".",
division_fmt="/",
power_fmt="{}{}",
parentheses_fmt="({})",
exp_call=formatting._pretty_fmt_exponent,
)
ureg = pint.UnitRegistry()
del ureg.formatter._formatters["P"]
q = ureg.Quantity(1, "m")
f"{q:~P}" Not sure if this is a supported use case, though. |
Thank you very much for your reply @keewis. It works, but indeed a dirty hack, which causes later problems in my case (because ureg.formatter._formatters[“P”] is permanently deleted). So, as suggested in the documentation, I have subclassed the default formatters as follows: from pint import UnitRegistry
from pint.delegates.formatter.plain import PrettyFormatter
from pint.delegates.formatter.full import FullFormatter
from pint.delegates.formatter._compound_unit_helpers import prepare_compount_unit
from pint.delegates.formatter._format_helpers import formatter, pretty_fmt_exponent
class MyPrettyFormatter(PrettyFormatter):
def format_unit(self, unit, uspec, sort_func, **babel_kwds) -> str:
numerator, denominator = prepare_compount_unit(
unit,
uspec,
sort_func=sort_func,
**babel_kwds,
registry=self._registry,
)
return formatter(
numerator,
denominator,
as_ratio=False,
single_denominator=False,
product_fmt=" ",
division_fmt="/",
power_fmt="{}{}",
parentheses_fmt=r"({})",
exp_call=pretty_fmt_exponent
)
class MyFullFormatter(FullFormatter):
default_format: str = "~P"
def __init__(self, registry: UnitRegistry | None = None):
super().__init__(registry)
self._formatters = {}
self._formatters["P"] = MyPrettyFormatter(registry)
ur = UnitRegistry()
ur.formatter = MyFullFormatter(ur)
ur.formatter._registry = ur
q = 1 * ur.meter ** 2 / ur.second
assert f"{q}" == "1.0 m² s⁻¹"
assert f"{q: ~D}" == ' 1.0 m ** 2 / s' and this solved my problem. Thank you again for your fast reply. |
The purpose of this issue is to have a central place to discuss everything related to formatting in 0.24
In the next version of Pint, all formatting operations are going to be moved into a Delegate. As expressed in #1895"
Part of this work has been merged into master, and the rest is being tested in develop.
A few important things:
Please add other issues, or PRs
As of 9e0789c
separate_format_defaults
is not implemented.Please provide your insights or comments about the API and overall organization.
The text was updated successfully, but these errors were encountered: