Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: item() to return scalar for arrays with exactly 1 element. #815

Open
randolf-scholz opened this issue Jun 20, 2024 · 8 comments
Open
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes.

Comments

@randolf-scholz
Copy link

def item(self) -> Scalar:
     """If array contains exactly one element, retun it as a scalar, else raises ValueError."""

Examples:

Demo:

import pytest
import xarray as xr
import pandas as pd
import polars as pl
import numpy as np

@pytest.mark.parametrize("data", [[], [1, 2, 3]])
@pytest.mark.parametrize(
    "array_type", [torch.tensor, np.array, pd.Series, pd.Index, pl.Series, xr.DataArray]
)
def test_item_valueerror(data, array_type):
    array = array_type(data)
    with pytest.raises(ValueError):
        array.item()


@pytest.mark.parametrize(
    "array_type", [torch.tensor, np.array, pd.Series, pd.Index, pl.Series, xr.DataArray]
)
def test_item(array_type):
    array = array_type([1])
    array.item()

Currently, only torch fails, because it raises RuntimeError instead of ValueError.

@vnmabus
Copy link

vnmabus commented Jun 20, 2024

This was discussed in #710 , along with the more general to_list, which works also for ND arrays.

@randolf-scholz
Copy link
Author

item() is a bit different from to_list, and honestly I find it confusing that a method named to_list can return something that is not a list.

@rgommers rgommers added the API extension Adds new functions or objects to the API. label Jun 21, 2024
@rgommers
Copy link
Member

.item() is more constrained than to_list indeed, and a bit cleaner. I checked other libraries - NumPy, PyTorch, JAX and CuPy implement .item(), Dask does not. (TF doesn't have it in the docs, so probably also not - but I can't check). CuPy/JAX do the transfer to CPU if the ndarray is on GPU.

This is a minor convenience method though, since float() & co work as well. They are clearer, since type-stable, and it also work for Dask. The only downside is that if you want some dtype-generic implementation to return a single element, you have to write a little utility for it to call int/float/complex/bool as appropriate. Something like:

def as_pyscalar(x):
    if xp.isdtype(x, 'real floating'):
        return float(x)
    elif xp.isdtype(x, 'complex floating'):
        return complex(x)
    elif xp.isdtype(x, 'integral'):
        return int(x)
    elif xp.isdtype(x, 'bool'):
        return bool(x)
    else:
        # raise error, or handle custom/non-standard dtypes if desired

Static typing of such a function, and of .item(), would also be a little annoying as it requires overloads.

@kgryte kgryte added RFC Request for comments. Feature requests and proposed changes. Needs Discussion Needs further discussion. labels Jun 21, 2024
@asmeurer
Copy link
Member

item also works on arrays with multiple dimensions, whereas we decided to make it so float does not.

>>> np.array([1]).item()
1

@rgommers
Copy link
Member

We discussed this in a call today, and concluded that this fell into a bucket of functionality that is useful, but also easy to implement on top of what's already in the standard. In addition, there are problems with trying to add this: a item() method is hard, because it's missing in some libraries and missing methods cannot be worked around in array-api-compat. If we'd do this, a function would be the way to go - but since that's not present in any libraries, it'd be new - hence more work, and likely to incur resistance from array library maintainers.

Outcome:

  1. Create the array-api-extra package where this kind of function can live, and add it there (probably as as_pyscalar or a similarly descriptive name, not as item)
  2. Only reconsider adding it to the standard itself in the future if most/all array libraries have already added that function.

@randolf-scholz
Copy link
Author

On a very fundamental level, I believe .item() makes no sense on DataFrame-like objects (pandas.DataFrame, polars.DataFrame, pyarrow.Table, etc.) because these are designed to represent heterogeneous data types.

From a mathematical PoV, item() acts on array-like data with homogeneous type, as a representation of the natural isomorphism V →K, when V is a 1-dimensional vector space over K.

@NeilGirdhar
Copy link

NeilGirdhar commented Aug 13, 2024

Is this usage guaranteed?

If so, should it be added somewhere to the specification? I looked for it here.

FWIW I also like the item method since it's all I've ever needed and it's simpler than tolist. I wonder if it should be on the array namespace rather than the array: (def item(x: Array, /) -> complex | bool) since it can be implemented using the array's public interface. (This is a common test in OO design for what should be a method versus a bare function.)

@asmeurer
Copy link
Member

asmeurer commented Aug 13, 2024

Yes, __float__ and so on are guaranteed (modulo the "lazy" note). See https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__float__.html#array_api.array.__float__. Though Ralf's helper should also include a if x.ndim != 1 or x.size != 1: raise ValueError check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API. Needs Discussion Needs further discussion. RFC Request for comments. Feature requests and proposed changes.
Projects
None yet
Development

No branches or pull requests

6 participants