Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support float16 datatype #410

Open
eschnett opened this issue Nov 11, 2023 · 5 comments
Open

Support float16 datatype #410

eschnett opened this issue Nov 11, 2023 · 5 comments

Comments

@eschnett
Copy link

I want to store float16 types in ndarrays. Would it be possible to extend scalar-datatype to allow for float16 and complex32 types?

I am specifically looking for the official float16 type (as available in many compilers and in CUDA) and not for bfloat16.

@braingram
Copy link
Contributor

Thanks for bringing this up.

I'm currently undecided on adding these types to the standard (vs adding them via an extension). My main concerns to adding more datatypes to the ndarray standard are:

  • there are many possible datatypes and a standard that supports them all will become unwieldy
  • by including them in the standard, we create the expectation that implementations will need to support all of these datatypes (even if only a small subset are useful)
  • this increases the distance between the asdf-standard ndarray definition and the developing array-api standard

@perrygreenfield or @eslavich do either of you have some input on what motivated the current datatypes in ndarray?

The current datatypes:


do appear to be a relatively close match to the array-api standard:
https://data-apis.org/array-api/latest/API_specification/data_types.html
(except for bool8 in asdf-standard vs bool in array-api and the inclusion of ascii and ucs4 datatypes in asdf-standard).

I haven't looked into what would be required to "extend" the ndarray schema with an updated datatype (to provide a schema for an extension that could implement float16, complex32 etc). @eschnett have you looked into that and are there changes to the ndarray schema that might make that easier?

@eschnett
Copy link
Author

I understand your concerns. Let me make an argument from a different point of view:

(1) The array api standard is a standard for Python. I am not really using Python; I'm usually using either C++ or Julia. Here is a list of datatypes supported by C++. That's a relatively small number of types, and these types will come up naturally in many circumstances. Julia supports float16, float32, and float64.

(2) It isn't really important to me whether this is part of the standard or part of an extension. However, asdftool should support these types in a relatively straightforward manner. When I look at the section on extensions then I am afraid that using an extension (not writing – using, as an end user) is somewhat complicated since extensions need to be installed. They probably need to be installed on every system that I am using, and by every collaborator of mine, and in every CI setup that I'm using. I'm afraid that installing a set of extensions might end up significantly more complex than just apt install python-asdf or pip install asdf, and if that is the case, then the implementation overhead of adding float16 to the asdf library could be justified. For example, adding float16 support to asdf-cxx was a relatively small effort and didn't increase the boilerplate code by very much.

(3) Independent of the above it would be very convenient if the content of ASDF files were accessible without loading an extension for datatypes. The HDF5 standard defines a way to define floating-point types, and contains generic code to convert between any kind of floating-point numbers. This allows reading any HDF5 file into any floating-point format. For example, reading float16 into a float32 in memory is possible, even if the local system does not support float16. I understand that this generic code might not be the most efficient, but such a feature would be valuable to have. I assume the layout of a floating-point number is defined in terms of position and length (in number of bits) of sign, mantissa, exponent, etc., with maybe a flag whether denormalized numbers etc. are supported. If ASDF was to add support for additional floating point numbers, be it be an extension or not, the such a mechanism would be quite convenient.

@braingram
Copy link
Contributor

To keep this conversation going (which I think is great!) I'm going to comment on a few (but not all) of the points raised.

re 1) float16 would be relatively easy for the python asdf library to support (as it's supported by numpy). complex32 is more difficult as it's not supported by numpy. asdf heavily relies on numpy for array handling and may need to define a new dtype and/or change how NDArrayType interacts with ndarray to support complex32 data. This is not to say that float16 and complex32 shouldn't be supported, I just wanted to highlight the difficulty this would present for the python library.

re 2) Is there a command(s) in asdftool that you'd like to see support the new types? Many parts of this tool automatically detect any installed extensions (via python entry points) but there are certain parts of the tool (like diff) that use the 'raw' tree. Thankfully installing extensions is as easy as pip install asdf-astropy (for adding asdf support for a large number of astropy objects) via the magic of entry points. I'm happy to go into details (and expand the documentation if you see spots that left you with more questions).

re 3) You definitely have me curious to look at the HDF5 standard and implementations to see how that's handled. Since ASDF blocks are just a collection of bytes it should be possible to make an extension that reads these bytes in and converts them to any other format. The python library does something like this for arrays where the block is read as a sequence of bytes and then converted to the dtype from the tree. Internally this is using the same extension API. The asdf-zarr extension also accesses the ASDF block data and converts it to types supported by zarr.

@CagtayFabry
Copy link

In light of this issue, would you also consider again adding support for datetime and timedelta dtypes as discussed in #270 ?

@braingram
Copy link
Contributor

@eschnett I just merged #411 adding float16 to the standard. I'm not marking this issue as closed since complex32 is also discussed.

I am hoping to make a new asdf-standard release soon to allow the companion PR adding float16 to the python asdf library asdf-format/asdf#1692 to be brought out of draft and reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants