Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occasional bad data files when writing with parallel zlib #1710

Closed
edwardhartnett opened this issue May 5, 2020 · 3 comments
Closed

occasional bad data files when writing with parallel zlib #1710

edwardhartnett opened this issue May 5, 2020 · 3 comments

Comments

@edwardhartnett
Copy link
Contributor

From the NOAA GFS system, we have a problem with files written with parallel compression. Sometimes they are unreadable. Recreating the file fixes the problem.

The problem occurs on read, with this error:

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5Dio.c line 199 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 603 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 2293 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 3658 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #004: H5Z.c line 1326 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
  #005: H5Zdeflate.c line 123 in H5Z_filter_deflate(): inflate() failed
    major: Data filters
    minor: Unable to initialize object

I am investigating further...

@WardF
Copy link
Member

WardF commented May 5, 2020

Thanks Ed, 'bad data' always grabs my attention. Watching this issue closely.

@edwardhartnett
Copy link
Contributor Author

OK, the good news is this is only happening on one machine. So there's a good chance this is the result of a build issue. We're going to rebuild the I/O stack and see if we can reproduce the problem...

@edwardhartnett
Copy link
Contributor Author

This turned out to be caused by mixing shared libraries from netcdf-c 4.7.4 and 4.7.2. Once they resolved their build issues, the problem went away. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants