Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF unit formatter incompatible with latest pint #522

Closed
aulemahal opened this issue Jun 28, 2024 · 5 comments · Fixed by #523
Closed

CF unit formatter incompatible with latest pint #522

aulemahal opened this issue Jun 28, 2024 · 5 comments · Fixed by #523
Assignees
Labels
bug Something isn't working opinion wanted User input requested

Comments

@aulemahal
Copy link
Contributor

With pint 0.24, the formatting machinery changed a lot. In 0.24.0 our formatter broke because we couldn't re-create Unit object, the registry was None (see hgrecco/pint#2009), this was fixed, but another bugfix broke the specific case of a dimensionless variable, see hgrecco/pint#2024.

MWE with cf-xarray

from cf_xarray import units

u = units.units.parse_units("")
f"{u:cf}"

raises the error shown in the pint issue.

@aulemahal aulemahal added bug Something isn't working opinion wanted User input requested labels Jun 28, 2024
@aulemahal aulemahal self-assigned this Jun 28, 2024
@aulemahal
Copy link
Contributor Author

aulemahal commented Jun 28, 2024

I realized our method was a bit complicated and could be made simpler using pint formatting tools instead of regex. However, I haven't found a solution that was backwards-compatible. Here are 3 solutions I found:

Custom formatter class

Idea suggested by @hgrecco

from pint.delegates.formatter.plain import DefaultFormatter
from pint.delegates.formatter._compound_unit_helpers import prepare_compount_unit
from pint import formatter

class CFUnitFormatter(DefaultFormatter):
    def format_unit(self, unit, uspec, sort_func, **babel_kwds):
        numerator, denominator = prepare_compount_unit(
            unit,
            "~D",
            sort_func=sort_func,
            **babel_kwds,
            registry=self._registry,
        )

        out = formatter(
            numerator,
            denominator,
            as_ratio=False,
            product_fmt=" ",
            power_fmt="{}{}",
            parentheses_fmt=r"{}",
        )
        out = out.replace("Δ°", "delta_deg")
        return out.replace("percent", "%")`

# Add to the registry
units.formatter._formatters['cf'] = CFUnitFormatter
  • Pro : User code does not need modification.
  • Con : Requires pinning pint >=0.24, will break eventually when prepare_compount_unit is modified (see comment in issue).

Simpler custom function

@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
    num = [(u, e) for u, e in unit.items() if e >= 0]
    den = [(u, e) for u, e in unit.items() if e < 0]
    # in pint < 0.24, the first argument of `formatter` is num + den
    # a test with packaging.version could make this work for all pint versions
    out = formatter(
        num,
        den,
        as_ratio=False,
        product_fmt="{} {}",
        power_fmt="{}{}"
    )

    out = out.replace("Δ°", "delta_deg")
    return out.replace("percent", "%")

The trouble here is that we don't have control on the "shortening" of the units. What was previously f"{u:cf}" must become f"{u:~cf}". And (of course!) f"{u:~cf}" will not work with previous versions of cf_xarray.

  • Pro : Simple. Could be made to work with previous versions of pint by.
  • Con : Requires users to change their code.

Hack and forget

Simply fix the dimensionless issue and don't think about it anymore.

    @pint.register_unit_format("cf")
    def short_formatter(unit, registry, **options):
        import re

        # avoid issue in pint >= 0.24.1
        unit  = unit._units.__class__({k.replace('dimensionless', ''): v for k, v in unit._units.items()})
        # convert UnitContainer back to Unit
        unit = registry.Unit(unit)
        # Print units using abbreviations (millimeter -> mm)
        s = f"{unit:~D}"

        # Search and replace patterns
        pat = r"(?P<inverse>(?:1 )?/ )?(?P<unit>\w+)(?: \*\* (?P<pow>\d))?"

        def repl(m):
            i, u, p = m.groups()
            p = p or (1 if i else "")
            neg = "-" if i else ""

            return f"{u}{neg}{p}"

        out, n = re.subn(pat, repl, s)

        # Remove multiplications
        out = out.replace(" * ", " ")
        # Delta degrees:
        out = out.replace("Δ°", "delta_deg")
        return out.replace("percent", "%")
  • Pro: Only fails with 0.24.0, User code needs not be changed.
  • Con: Still weirdly hacky.

@aulemahal
Copy link
Contributor Author

aulemahal commented Jun 28, 2024

I found something that works in all cases!

@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
    # pint 0.24.1 gives this for dimensionless units
    if unit == {'dimensionless': 1}:
        return ""

    # Shorten the names
    unit = pint.util.UnitsContainer({
        registry._get_symbol(u): exp
        for u, exp in unit.items()
    })

    if Version(pint.__version__) < Version('0.24'):
        args = (unit.items(),)
    else:
        args = (
            ((u, e) for u, e in unit.items() if e >= 0),
            ((u, e) for u, e in unit.items() if e < 0),
        )

    out = pint.formatter(
        *args,
        as_ratio=False,
        product_fmt=" ",
        power_fmt="{}{}"
    )
    out = out.replace("Δ°", "delta_deg")
    return out.replace("percent", "%")

Should gives the same result as before for all versions of pint (except 0.24.0, of course). In pint < 0.24, a dimensionless unit will yield "dimensionless" because in the older pints, the custom formatter isn't even called in that case. In pint 0.24.1, this was modified and the special case is managed at the beginning of this proposition.

I am using packaging.version.Version to ensure backward compatibility, which is not the elegantest but seems fine. Also, I assumed it was reasonable to use the "private" function registry._get_symbol.

@dcherian
Copy link
Contributor

dcherian commented Jun 28, 2024

Nice work! Happy to merge this change. Let's also add a pint!=0.24.0 version pin.

Just curious, why out = out.replace("Δ°", "delta_deg")

@aulemahal
Copy link
Contributor Author

The delta_deg has to do with https://pint.readthedocs.io/en/stable/user/nonmult.html, i.e. when two temperatures are subtracted, the result is not a temperature but rather a "delta". And the short symbol for that is the greek letter.

Both "Δ°C" and "delta_degC" are not recognized by udunits, so my guess is that this is to avoid unicode problems down the line in a netCDF ?

I recently stumbled upon these delta in xclim and I added an automatic translation into the related absolute unit in the specific function. (Δ°C -> K, Δ°F -> °R) But maybe that's a bit too strong for cf-xarray...

@dcherian
Copy link
Contributor

Let's add that as a comment in the code when you send in your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working opinion wanted User input requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants