Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bitgroom support to NCZarr #2140

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1753,10 +1753,15 @@ CHECK_FUNCTION_EXISTS(atexit HAVE_ATEXIT)

# Control invoking nc_finalize at exit
OPTION(ENABLE_ATEXIT_FINALIZE "Invoke nc_finalize at exit." ON)
IF(ENABLE_ATEXIT_FINALIZE)
IF(NOT HAVE_ATEXIT)
IF(ENABLE_ATEXIT_FINALIZE AND NOT HAVE_ATEXIT)
SET(ENABLE_ATEXIT_FINALIZE OFF CACHE BOOL "Enable ATEXIT" FORCE)
MESSAGE(WARNING "ENABLE_ATEXIT_FINALIZE set but atexit() function not defined")
ELSE()
IF(MSVC)
SET(ENABLE_ATEXIT_FINALIZE OFF CACHE BOOL "Enable ATEXIT" FORCE)
MESSAGE(WARNING "ENABLE_ATEXIT_FINALIZE not supported under Windows")
ENDIF()
ENDIF()
ENDIF()

Expand Down
2 changes: 2 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,15 @@ This file contains a high-level description of this package's evolution. Release

## 4.8.2 - TBD

* [Enhancement] Add bitgroom support to NCZarr. See [Github #2140](https://github.com/Unidata/netcdf-c/pull/2140)
* [Enhancement] Added options to suppress the new behavior from [Github #2135](https://github.com/Unidata/netcdf-c/pull/2135). The options for `cmake` and `configure` are, respectively `-DENABLE_LIBXML2` and `--(enable/disable)-libxml2`. Both of these options defaul to 'on/enabled'. When disabled, the bundled `ezxml` XML interpreter is used regardless of whether `libxml2` is present on the system.
* [Enhancement] Support optional use of libxml2, otherwise default to ezxml. See [Github #2135](https://github.com/Unidata/netcdf-c/pull/2135) -- H/T to [Egbert Eich](https://github.com/e4t).
* [Bug Fix] Fix several os related errors. See [Github #2138](https://github.com/Unidata/netcdf-c/pull/2138).
* [Enhancement] Support byte-range reading of netcdf-3 files stored in private buckets in S3. See [Github #2134](https://github.com/Unidata/netcdf-c/pull/2134)
* [Enhancement] Support Amazon S3 access for NCZarr. Also support use of the existing Amazon SDK credentials system. See [Github #2114](https://github.com/Unidata/netcdf-c/pull/2114)
* [Bug Fix] Fix string allocation error in H5FDhttp.c. See [Github #2127](https://github.com/Unidata/netcdf-c/pull/2127).
* [Bug Fix] Apply patches for ezxml and for selected oss-fuzz detected errors. See [Github #2125](https://github.com/Unidata/netcdf-c/pull/2125).
* [Enhancement] Support Amazon S3 access for NCZarr. Also support use of the existing Amazon SDK credentials system. See [Github #2114](https://github.com/Unidata/netcdf-c/pull/2114)
* [Bug Fix] Ensure that internal Fortran APIs are always defined. See [Github #2098](https://github.com/Unidata/netcdf-c/pull/2098).
* [Enhancement] Support filters for NCZarr. See [Github #2101](https://github.com/Unidata/netcdf-c/pull/2101)
* [Bug Fix] Make PR 2075 long file name be idempotent. See [Github #2094](https://github.com/Unidata/netcdf-c/pull/2094).
Expand Down
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -1062,7 +1062,7 @@ AC_MSG_CHECKING([whether to search for and use external libxml2])
AC_ARG_ENABLE([libxml2],
[AS_HELP_STRING([--disable-libxml2],
[disable detection and use of libxml2 in favor of the bundled ezxml interpreter])])
test "x$disable_libxml2" = xyes && enable_libxml2=no
test "x$enable_libxml2" = xno || enable_libxml2=yes
AC_MSG_RESULT($enable_libxml2)

AC_SUBST([XMLPARSER],"ezxml")
Expand Down
23 changes: 22 additions & 1 deletion docs/nczarr.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,28 @@ Examples of currently unsupported types are as follows:

Again, this list should diminish over time.

# NCZarr versus netCDF-4. {#nczarr_netcdf4}

If ncgen is used to create both a netCDF-4 file and an NCZarr store using
the same .cdl file, then some differences may be observed.

## _FillValue
The Zarr format stores the fill value as part of the .zarray metadata,
while netcdf-4 stores this in the _FillValue attribute. The .zattr for that
array may also contain the _FillValue attribute as well, so in NCZarr, the
fill value may occur in two places.

The rule is that if nc_def_var_fill was called or the .cdl file defines the _FillValue attribute,
then that attribute will appear in the .zattr metadata, otherwise not.
However, if the fill_value key is defined, then it is used in place of the _FillValue attribute.

If a Zarr store is read that was created by some other Zarr implementation, then
the the fill_value key may be set but there will probably not be any _FillValue attribute.
As above, then this value will be used.

The net result is that NCZarr stores will carry the fill value and use it in subsequent
reads and writes.

# Notes on Debugging NCZarr Access {#nczarr_debug}

The NCZarr support has a trace facility.
Expand Down Expand Up @@ -320,7 +342,6 @@ aws_secret_access_key=YYYY...
```
See Appendix E for additional information.


## Addressing Style

The notion of "addressing style" may need some expansion.
Expand Down
8 changes: 5 additions & 3 deletions libdispatch/dpathmgr.c
Original file line number Diff line number Diff line change
Expand Up @@ -678,7 +678,7 @@ parsepath(const char* inpath, struct Path* path)
&& (tmp1[0] == '/')
&& strchr(windrive,tmp1[1]) != NULL
&& (tmp1[2] == '/' || tmp1[2] == '\0')) {
/* Assume this is a mingw path */
/* Assume this is a msys path */
path->drive = tmp1[1];
/* Remainder */
if(tmp1[2] == '\0')
Expand Down Expand Up @@ -869,10 +869,12 @@ static int
getlocalpathkind(void)
{
int kind = NCPD_UNKNOWN;
#ifdef __CYGWIN__
#if defined __CYGWIN__
kind = NCPD_CYGWIN;
#elif defined __MINGW32__
kind = NCPD_WIN;
kind = NCPD_WIN; /* Do not understand the relationship of MSYS to MINGW */
#elif defined __MSYS__
kind = NCPD_MSYS;
#elif defined _MSC_VER /* not _WIN32 */
kind = NCPD_WIN;
#elif defined __MSYS__
Expand Down
27 changes: 16 additions & 11 deletions libncxml/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,24 @@ EXTRA_DIST = CMakeLists.txt license.txt
REPO=https://downloads.sourceforge.net/project/ezxml/
EZXML=ezxml-0.8.6.tar.gz
makelib::
rm -fr ./ezxml.[ch] ./license.txt ./ezxml
tar -zxf ./${EZXML}
echo '#define EZXML_NOMMAP 1' > ezxml.c
cat <ezxml/ezxml.c | \
sed -e '/<unistd.h>/d' | \
sed -e 's|//\(.*\)|/*\1*/|' \
sed -e 's|//\(.*\)|/*\1*/|' \
cat >./ezxml.c
sed -e 's|//\(.*\)|/*\1*/|' <ezxml/ezxml.h >./ezxml.h
sed -i .bak 's/<fcntl.h>/<fcntl.h>\n#ifdef HAVE_UNISTD_H\n#include <unistd.h>\n#endif/g' ezxml.h
rm ezxml.h.bak
echo "WARNING DO NOT RUN THIS since the patches are not in the tar file"
exit 1
rm -fr ./ezxml.[ch] ./license.txt ./ezxml ./ezxml.c.? ./ezxml.h.?
tar -zxf ${EZXML}
cat ezxml/ezxml.h > ./ezxml.h
sed -i.1 -e '/ezxml_parse_fd/d' ezxml.h
sed -i.2 -e '/ezxml_parse_file/d' ezxml.h
sed -i.3 -e '/ezxml_parse_fp/d' ezxml.h
sed -i.4 -e 's|//\(.*\)|/*\1*/|' ezxml.h
echo "#define EZXML_NOMMAP 1" > ./ezxml.c
cat ezxml/ezxml.c >> ./ezxml.c
sed -i.1 -e '/<unistd.h>/d' ezxml.c
sed -i.2 -e '/ezxml_parse_fp(FILE/i#if 0' ./ezxml.c
sed -i.3 -e '/ezxml_ampencode(const/i#endif //0' ./ezxml.c
sed -i.4 -e 's|//\(.*\)|/*\1*/|' ezxml.c
cp ezxml/license.txt .
rm -fr ezxml
rm -f ezxml.c.? ezxml.h.?

# Define path to the xml github dir; this value assumes it is in a parallel directory to netcdf-c (YMMV)
GITSRC=${top_srcdir}/../tinyxml2
Expand Down
2 changes: 2 additions & 0 deletions libncxml/ezxml.c
Original file line number Diff line number Diff line change
Expand Up @@ -623,6 +623,7 @@ ezxml_t ezxml_parse_str(char *s, size_t len)
/* Wrapper for ezxml_parse_str() that accepts a file stream. Reads the entire*/
/* stream into memory and then parses it. For xml files, use ezxml_parse_file()*/
/* or ezxml_parse_fd()*/
#if 0
ezxml_t ezxml_parse_fp(FILE *fp)
{
ezxml_root_t root;
Expand Down Expand Up @@ -688,6 +689,7 @@ ezxml_t ezxml_parse_file(const char *file)

/* Encodes ampersand sequences appending the results to *dst, reallocating *dst*/
/* if length excedes max. a is non-zero for attribute encoding. Returns *dst*/
#endif /*0*/
char *ezxml_ampencode(const char *s, size_t len, char **dst, size_t *dlen,
size_t *max, short a)
{
Expand Down
133 changes: 62 additions & 71 deletions libncxml/ezxml.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,142 +29,133 @@
#include <stdio.h>
#include <stdarg.h>
#include <fcntl.h>
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif

#ifdef __cplusplus
extern "C" {
#endif

#define EZXML_BUFSIZE 1024 /* size of internal memory buffers*/
#define EZXML_NAMEM 0x80 /* name is malloced*/
#define EZXML_TXTM 0x40 /* txt is malloced*/
#define EZXML_DUP 0x20 /* attribute name and value are strduped*/
#define EZXML_BUFSIZE 1024 /* size of internal memory buffers */
#define EZXML_NAMEM 0x80 /* name is malloced */
#define EZXML_TXTM 0x40 /* txt is malloced */
#define EZXML_DUP 0x20 /* attribute name and value are strduped */

typedef struct ezxml *ezxml_t;
struct ezxml {
char *name; /* tag name*/
char **attr; /* tag attributes { name, value, name, value, ... NULL }*/
char *txt; /* tag character content, empty string if none*/
size_t off; /* tag offset from start of parent tag character content*/
ezxml_t next; /* next tag with same name in this section at this depth*/
ezxml_t sibling; /* next tag with different name in same section and depth*/
ezxml_t ordered; /* next tag, same section and depth, in original order*/
ezxml_t child; /* head of sub tag list, NULL if none*/
ezxml_t parent; /* parent tag, NULL if current tag is root tag*/
short flags; /* additional information*/
char *name; /* tag name */
char **attr; /* tag attributes { name, value, name, value, ... NULL } */
char *txt; /* tag character content, empty string if none */
size_t off; /* tag offset from start of parent tag character content */
ezxml_t next; /* next tag with same name in this section at this depth */
ezxml_t sibling; /* next tag with different name in same section and depth */
ezxml_t ordered; /* next tag, same section and depth, in original order */
ezxml_t child; /* head of sub tag list, NULL if none */
ezxml_t parent; /* parent tag, NULL if current tag is root tag */
short flags; /* additional information */
};

/* Given a string of xml data and its length, parses it and creates an ezxml*/
/* structure. For efficiency, modifies the data by adding null terminators*/
/* and decoding ampersand sequences. If you don't want this, copy the data and*/
/* pass in the copy. Returns NULL on failure.*/
/* Given a string of xml data and its length, parses it and creates an ezxml */
/* structure. For efficiency, modifies the data by adding null terminators */
/* and decoding ampersand sequences. If you don't want this, copy the data and */
/* pass in the copy. Returns NULL on failure. */
ezxml_t ezxml_parse_str(char *s, size_t len);

/* A wrapper for ezxml_parse_str() that accepts a file descriptor. First*/
/* attempts to mem map the file. Failing that, reads the file into memory.*/
/* Returns NULL on failure.*/
ezxml_t ezxml_parse_fd(int fd);

/* a wrapper for ezxml_parse_fd() that accepts a file name*/
ezxml_t ezxml_parse_file(const char *file);
/* A wrapper for ezxml_parse_str() that accepts a file descriptor. First */
/* attempts to mem map the file. Failing that, reads the file into memory. */
/* Returns NULL on failure. */

/* Wrapper for ezxml_parse_str() that accepts a file stream. Reads the entire*/
/* stream into memory and then parses it. For xml files, use ezxml_parse_file()*/
/* or ezxml_parse_fd()*/
ezxml_t ezxml_parse_fp(FILE *fp);

/* Wrapper for ezxml_parse_str() that accepts a file stream. Reads the entire */

/* returns the first child tag (one level deeper) with the given name or NULL*/
/* if not found*/
/* returns the first child tag (one level deeper) with the given name or NULL */
/* if not found */
ezxml_t ezxml_child(ezxml_t xml, const char *name);

/* returns the next tag of the same name in the same section and depth or NULL*/
/* if not found*/
/* returns the next tag of the same name in the same section and depth or NULL */
/* if not found */
#define ezxml_next(xml) ((xml) ? xml->next : NULL)

/* Returns the Nth tag with the same name in the same section at the same depth*/
/* or NULL if not found. An index of 0 returns the tag given.*/
/* Returns the Nth tag with the same name in the same section at the same depth */
/* or NULL if not found. An index of 0 returns the tag given. */
ezxml_t ezxml_idx(ezxml_t xml, int idx);

/* returns the name of the given tag*/
/* returns the name of the given tag */
#define ezxml_name(xml) ((xml) ? xml->name : NULL)

/* returns the given tag's character content or empty string if none*/
/* returns the given tag's character content or empty string if none */
#define ezxml_txt(xml) ((xml) ? xml->txt : "")

/* returns the value of the requested tag attribute, or NULL if not found*/
/* returns the value of the requested tag attribute, or NULL if not found */
const char *ezxml_attr(ezxml_t xml, const char *attr);

/* Traverses the ezxml sturcture to retrieve a specific subtag. Takes a*/
/* variable length list of tag names and indexes. The argument list must be*/
/* terminated by either an index of -1 or an empty string tag name. Example: */
/* title = ezxml_get(library, "shelf", 0, "book", 2, "title", -1);*/
/* This retrieves the title of the 3rd book on the 1st shelf of library.*/
/* Returns NULL if not found.*/
/* Traverses the ezxml sturcture to retrieve a specific subtag. Takes a */
/* variable length list of tag names and indexes. The argument list must be */
/* terminated by either an index of -1 or an empty string tag name. Example: */
/* title = ezxml_get(library, "shelf", 0, "book", 2, "title", -1); */
/* This retrieves the title of the 3rd book on the 1st shelf of library. */
/* Returns NULL if not found. */
ezxml_t ezxml_get(ezxml_t xml, ...);

/* Converts an ezxml structure back to xml. Returns a string of xml data that*/
/* must be freed.*/
/* Converts an ezxml structure back to xml. Returns a string of xml data that */
/* must be freed. */
char *ezxml_toxml(ezxml_t xml);

/* returns a NULL terminated array of processing instructions for the given*/
/* target*/
/* returns a NULL terminated array of processing instructions for the given */
/* target */
const char **ezxml_pi(ezxml_t xml, const char *target);

/* frees the memory allocated for an ezxml structure*/
/* frees the memory allocated for an ezxml structure */
void ezxml_free(ezxml_t xml);

/* returns parser error message or empty string if none*/
/* returns parser error message or empty string if none */
const char *ezxml_error(ezxml_t xml);

/* returns a new empty ezxml structure with the given root tag name*/
/* returns a new empty ezxml structure with the given root tag name */
ezxml_t ezxml_new(const char *name);

/* wrapper for ezxml_new() that strdup()s name*/
/* wrapper for ezxml_new() that strdup()s name */
#define ezxml_new_d(name) ezxml_set_flag(ezxml_new(strdup(name)), EZXML_NAMEM)

/* Adds a child tag. off is the offset of the child tag relative to the start*/
/* of the parent tag's character content. Returns the child tag.*/
/* Adds a child tag. off is the offset of the child tag relative to the start */
/* of the parent tag's character content. Returns the child tag. */
ezxml_t ezxml_add_child(ezxml_t xml, const char *name, size_t off);

/* wrapper for ezxml_add_child() that strdup()s name*/
/* wrapper for ezxml_add_child() that strdup()s name */
#define ezxml_add_child_d(xml, name, off) \
ezxml_set_flag(ezxml_add_child(xml, strdup(name), off), EZXML_NAMEM)

/* sets the character content for the given tag and returns the tag*/
/* sets the character content for the given tag and returns the tag */
ezxml_t ezxml_set_txt(ezxml_t xml, const char *txt);

/* wrapper for ezxml_set_txt() that strdup()s txt*/
/* wrapper for ezxml_set_txt() that strdup()s txt */
#define ezxml_set_txt_d(xml, txt) \
ezxml_set_flag(ezxml_set_txt(xml, strdup(txt)), EZXML_TXTM)

/* Sets the given tag attribute or adds a new attribute if not found. A value*/
/* of NULL will remove the specified attribute. Returns the tag given.*/
/* Sets the given tag attribute or adds a new attribute if not found. A value */
/* of NULL will remove the specified attribute. Returns the tag given. */
ezxml_t ezxml_set_attr(ezxml_t xml, const char *name, const char *value);

/* Wrapper for ezxml_set_attr() that strdup()s name/value. Value cannot be NULL*/
/* Wrapper for ezxml_set_attr() that strdup()s name/value. Value cannot be NULL */
#define ezxml_set_attr_d(xml, name, value) \
ezxml_set_attr(ezxml_set_flag(xml, EZXML_DUP), strdup(name), strdup(value))

/* sets a flag for the given tag and returns the tag*/
/* sets a flag for the given tag and returns the tag */
ezxml_t ezxml_set_flag(ezxml_t xml, short flag);

/* removes a tag along with its subtags without freeing its memory*/
/* removes a tag along with its subtags without freeing its memory */
ezxml_t ezxml_cut(ezxml_t xml);

/* inserts an existing tag into an ezxml structure*/
/* inserts an existing tag into an ezxml structure */
ezxml_t ezxml_insert(ezxml_t xml, ezxml_t dest, size_t off);

/* Moves an existing tag to become a subtag of dest at the given offset from*/
/* the start of dest's character content. Returns the moved tag.*/
/* Moves an existing tag to become a subtag of dest at the given offset from */
/* the start of dest's character content. Returns the moved tag. */
#define ezxml_move(xml, dest, off) ezxml_insert(ezxml_cut(xml), dest, off)

/* removes a tag along with all its subtags*/
/* removes a tag along with all its subtags */
#define ezxml_remove(xml) ezxml_free(ezxml_cut(xml))

#ifdef __cplusplus
}
#endif

#endif /* _EZXML_H*/
#endif /* _EZXML_H */
1 change: 1 addition & 0 deletions libnczarr/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ zodom.c
zopen.c
zprov.c
zsync.c
zload.c
ztype.c
zutil.c
zvar.c
Expand Down
1 change: 1 addition & 0 deletions libnczarr/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ zodom.c \
zopen.c \
zprov.c \
zsync.c \
zload.c \
ztype.c \
zutil.c \
zvar.c \
Expand Down
1 change: 1 addition & 0 deletions libnczarr/zarr.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ EXTERNL int NCZ_create_fill_chunk(size64_t chunksize, size_t typesize, const voi
EXTERNL int NCZ_s3clear(NCS3INFO* s3map);
EXTERNL int NCZ_ischunkname(const char* name,char dimsep);
EXTERNL char* NCZ_chunkpath(struct ChunkKey key);
EXTERNL int ncz_rebuild_fill_chunk(NC_VAR_INFO_T* var);

/* zwalk.c */
EXTERNL int NCZ_read_chunk(int ncid, int varid, size64_t* zindices, void* chunkdata);
Expand Down
Loading