Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too much API breakage - mandatory use of semantic versioning #2

Closed
conradsnicta opened this issue Jun 25, 2020 · 6 comments
Closed
Assignees

Comments

@conradsnicta
Copy link

HDF5 v1.12 has changed so much since the initial release of HDF5 that it's more appropriate to call it HDF6 at this stage.

The breaking API changes between minor HDF5 releases (eg. 1.8, 1.10, 1.12) are rather disconcerting. While there are various pre-processor tricks used as "workarounds", they are just brittle hacks in the end. Stuff like H5_USE_110_API is just papering over the unwarranted breakage.

The point of HDF is to be a boring abstraction for storage functionality, where by "boring" I mean stable. HDF5 is definitely not boring, which decreases its value and utility.

The use of semantic versioning to label HDF releases would help in this regard immensely. Minor releases such as v1.12 should not be introducing any kind of API breaks. Such breaks are much better communicated by bumping the major version number, for example HDF6, HDF7, etc.

The current approach of API breakage between minor releases is causing all sorts of issues. Examples:

TLDR: I would suggest that the HDF5 developers be made explicitly aware that public API breaks in HDF5 releases are not exactly conducive to productivity. It would be useful to properly communicate such changes via an increase in the major version number, for example HDF6. The current practice of updating "HDF5" with incompatible changes is overall a negative net contribution.

seanm added a commit to seanm/hdf5 that referenced this issue Jun 29, 2020
…inator

Found by ASan:

==18515==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200001a791 at pc 0x00010111b47a bp 0x7ffeefbfe8f0 sp 0x7ffeefbfe098
READ of size 2 at 0x60200001a791 thread T0
    #0 0x10111b479 in wrap_strlen (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x14479)
    HDFGroup#1 0x100841c4c in H5T__conv_vlen H5Tconv.c:3193
    HDFGroup#2 0x1008113d9 in H5T_convert H5T.c:5024
    HDFGroup#3 0x100379fbb in H5D__scatgath_write H5Dscatgath.c:701
    HDFGroup#4 0x10032d235 in H5D__contig_write H5Dcontig.c:657
    HDFGroup#5 0x10036c768 in H5D__write H5Dio.c:819
    HDFGroup#6 0x10036b650 in H5Dwrite H5Dio.c:335
    HDFGroup#7 0x100115fa7 in H5::DataSet::write(void const*, H5::DataType const&, H5::DataSpace const&, H5::DataSpace const&, H5::DSetMemXferPropList const&) const H5DataSet.cpp:506
    HDFGroup#8 0x100031d9a in test_vlstrings tvlstr.cpp:191
    HDFGroup#9 0x1001aec42 in PerformTests testframe.c:323
    HDFGroup#10 0x100002ac0 in main testhdf5.cpp:116
    HDFGroup#11 0x7fff5eec6014 in start (libdyld.dylib:x86_64+0x1014)

0x60200001a791 is located 0 bytes to the right of 1-byte region [0x60200001a790,0x60200001a791)
allocated by thread T0 here:
    #0 0x10115c547 in wrap_calloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x55547)
    HDFGroup#1 0x100031d70 in test_vlstrings tvlstr.cpp:187
    HDFGroup#2 0x1001aec42 in PerformTests testframe.c:323
    HDFGroup#3 0x100002ac0 in main testhdf5.cpp:116
    HDFGroup#4 0x7fff5eec6014 in start (libdyld.dylib:x86_64+0x1014)
seanm added a commit to seanm/hdf5 that referenced this issue Jun 29, 2020
…H5Rdereference2()

Found by ASan:

==52756==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffeefbfedc8 at pc 0x000100be250f bp 0x7ffeefbfe890 sp 0x7ffeefbfe888
READ of size 1 at 0x7ffeefbfedc8 thread T0
    #0 0x100be250e in H5R__dereference H5Rint.c:416
    HDFGroup#1 0x100bddbd1 in H5Rdereference2 H5R.c:185
    HDFGroup#2 0x1001c168c in test_reference_region trefer.c:798
    HDFGroup#3 0x1001ba134 in test_reference trefer.c:1863
    HDFGroup#4 0x100660c42 in PerformTests testframe.c:323
    HDFGroup#5 0x100001fdc in main testhdf5.c:77
    HDFGroup#6 0x7fff5eec6014 in start (libdyld.dylib:x86_64+0x1014)

Address 0x7ffeefbfedc8 is located in stack of thread T0 at offset 616 in frame
    #0 0x1001be1df in test_reference_region trefer.c:507
seanm added a commit to seanm/hdf5 that referenced this issue Jun 29, 2020
… for the actual data

Found by ASan:

==19994==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffeefbfe920 at pc 0x0001010b064f bp 0x7ffeefbfc810 sp 0x7ffeefbfbfc0
READ of size 128 at 0x7ffeefbfe920 thread T0
    #0 0x1010b064e in __asan_memcpy (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x5264e)
    HDFGroup#1 0x1002cde61 in H5D__gather_mem H5Dscatgath.c:428
    HDFGroup#2 0x1002d0ca4 in H5D__scatgath_write H5Dscatgath.c:663
    HDFGroup#3 0x10025c948 in H5D__chunk_write H5Dchunk.c:2470
    HDFGroup#4 0x1002c3768 in H5D__write H5Dio.c:819
    HDFGroup#5 0x1002c2650 in H5Dwrite H5Dio.c:335
    HDFGroup#6 0x1000d4b8a in H5TBwrite_fields_index H5TB.c:729
    HDFGroup#7 0x100008c13 in test_table test_table.c:1420
    HDFGroup#8 0x1000028ff in main test_table.c:1686
    HDFGroup#9 0x7fff5eec6014 in start (libdyld.dylib:x86_64+0x1014)

Address 0x7ffeefbfe920 is located in stack of thread T0 at offset 4160 in frame
    #0 0x1000029cf in test_table test_table.c:211
@ArchangeGabriel
Copy link

HDF6 is not a correct major version number change, HDF5 2.x would be. AFAIK the 5 in HDF5 relates to the storage specification, and that is stable AFAICT. What is changing here is the API of the official library for accessing this file format, and this library has version 1.12.x when it should likely be 2.x.y.

@scivision
Copy link
Contributor

Also msys2 / h5py is broken with hdf5 1.12

@conradsnicta
Copy link
Author

HDF6 is not a correct major version number change, HDF5 2.x would be. AFAIK the 5 in HDF5 relates to the storage specification, and that is stable AFAICT. What is changing here is the API of the official library for accessing this file format, and this library has version 1.12.x when it should likely be 2.x.y.

This is probably a good trade-off.

@tbeu
Copy link
Contributor

tbeu commented Jan 10, 2021

Same holds for libmatio where I needed to make the appropriate changes to support HDF5 v1.8x, v1.10.x and v1.12.0.

@FRidh FRidh mentioned this issue Apr 4, 2021
10 tasks
vchoi-hdfgroup added a commit that referenced this issue Apr 22, 2021
vchoi-hdfgroup pushed a commit that referenced this issue Jan 5, 2022
    See Kent's documentation "Designed to Fail Tests and Issues".
    (a) Fix for issue #2:
    --Print out meaningful message about max_lag when there is
    checksum error in loading an entry via H5C__load_entry().
    --H5C.c: H5C_protect()
    (b) Fix for issue #4:
    --Allocate more space when the copy of the index read from the metadata file is
      bigger than the existing size
    --H5Fvfd_swmr.c: H5F_vfd_swmr_reader_end_of_tick()
(B) When putting the old index into the delayed free list, use the old
    writer_index_offset instead of the current writer_index_offset
    --H5Fvfd_swmr.c: vfd_swmr_enlarge_shadow_index()
(C) When there is error form calling H5F_vfd_swmr_process_eot_queue() in
    VFD_SWMR_ENTER(err) and VFD_SWMR_LEAVE(err), should report
    FAIL instead of "err"
    --H5private.h: VFD_SWMR_ENTER and VFD_SWMR_LEAVE macros
(D) Add tests to verify issue #2 and issue #4 are fixed.
@derobins derobins self-assigned this May 27, 2022
@derobins
Copy link
Member

derobins commented Jul 7, 2022

Proper semantic versioning will begin with HDF5 2.0.0, which will probably release sometime in 2023, after 1.14.0 releases.

@derobins derobins closed this as completed Jul 7, 2022
@conradsnicta
Copy link
Author

The eventual use of semantic versioning with HDF5 2.0.0 is certainly welcome, though it should have been applied much earlier.

Why not simply skip 1.14 and go directly to 2.0? The sooner HDF5 gets out of this mess, the better, saving everybody a metric ton of headaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants