Skip to content

Commit

Permalink
Merge pull request #140 from JimBiardCics/string
Browse files Browse the repository at this point in the history
Add support for variables of type string
  • Loading branch information
dblodgett-usgs authored Sep 10, 2019
2 parents f56f374 + 3f0e55a commit 2da5bad
Show file tree
Hide file tree
Showing 6 changed files with 93 additions and 28 deletions.
19 changes: 9 additions & 10 deletions apph.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ If the time series instances have the same number of elements and the time value
    alt:units = "m";
    alt:positive = "up";
    alt:axis = "Z";
  char station_name(station, name_strlen) ;
  string station_name(station) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
attributes:
Expand Down Expand Up @@ -184,7 +184,6 @@ When the intention of a data variable is to contain only a single time series, t
----
dimensions:
   time = 100233 ;
   name_strlen = 23 ;
variables:
   float lon ;
Expand All @@ -201,7 +200,7 @@ When the intention of a data variable is to contain only a single time series, t
       alt:units = "m";
       alt:positive = "up";
       alt:axis = "Z";
   char station_name(name_strlen) ;
   string station_name ;
       station_name:long_name = "station name" ;
       station_name:cf_role = "timeseries_id";
Expand Down Expand Up @@ -316,7 +315,7 @@ When the time series have different lengths and the data values for entire time
       alt:units = "m";
       alt:positive = "up";
       alt:axis = "Z";
   char station_name(station, name_strlen) ;
   string station_name(station) ;
       station_name:long_name = "station name" ;
       station_name:cf_role = "timeseries_id";
   int station_info(station) ;
Expand Down Expand Up @@ -708,7 +707,7 @@ When storing multiple trajectories in the same file, and the number of elements
name_strlen = 23 ;
variables:
   char trajectory(trajectory, name_strlen) ;
   string trajectory(trajectory) ;
     trajectory:cf_role = "trajectory_id";
     trajectory:long_name = "trajectory name" ;
   int trajectory_info(trajectory) ;
Expand Down Expand Up @@ -827,7 +826,7 @@ When the number of elements for each trajectory varies, and one can control the
name_strlen = 23 ;
variables:
   char trajectory(trajectory, name_strlen) ;
   string trajectory(trajectory) ;
         trajectory:cf_role = "trajectory_id";
   int rowSize(trajectory) ;
       rowSize:long_name = "number of obs for this trajectory " ;
Expand Down Expand Up @@ -886,7 +885,7 @@ When the number of elements at each trajectory vary, and the elements cannot be
   obs = UNLIMITED ;
   trajectory = 77 ;
name_strlen = 23 ;
variables:
   char trajectory(trajectory, name_strlen) ;
       trajectory:cf_role = "trajectory_id";
Expand Down Expand Up @@ -970,7 +969,7 @@ When storing time series of profiles at multiple stations in the same data varia
       lat:standard_name = "latitude";
       lat:long_name = "station latitude" ;
       lat:units = "degrees_north" ;
   char station_name(station, name_strlen) ;
   string station_name(station) ;
       station_name:cf_role = "timeseries_id" ;
       station_name:long_name = "station name" ;
   int station_info(station) ;
Expand Down Expand Up @@ -1067,7 +1066,7 @@ If there is only one station in the data variable, there is no need for the stat
   profile = 30 ;
   z = 42 ;
name_strlen = 23 ;
variables:
   float lon ;
       lon:standard_name = "longitude";
Expand Down Expand Up @@ -1148,7 +1147,7 @@ When the number of profiles and levels for each station varies, one can use a ra
   float alt(station) ;
       alt:long_name = "altitude above MSL" ;
       alt:units = "m" ;
   char station_name(station, name_strlen) ;
   string station_name(station) ;
       station_name:long_name = "station name" ;
       station_name:cf_role = "timeseries_id";
   int station_info(station) ;
Expand Down
45 changes: 43 additions & 2 deletions ch02.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,50 @@ NetCDF files should have the file name extension "**`.nc`**".

=== Data Types

The netCDF data types **`char`**, **`byte`**, **`short`**, **`int`**, **`float`** or **`real`**, and **`double`** are all acceptable. The **`char`** type is not intended for numeric data. One byte numeric data should be stored using the **`byte`** data type. All integer types are treated by the netCDF interface as signed. It is possible to treat the **`byte`** type as unsigned by using the NUG convention of indicating the unsigned range using the **`valid_min`**, **`valid_max`**, or **`valid_range`** attributes.
The netCDF data types **`string`**, **`char`**, **`byte`**, **`short`**,
**`int`**, **`float`** or **`real`**, and **`double`** are all acceptable.
The **`string`** type is only available in files using the netCDF version 4
(netCDF-4) format.
The **`char`** and **`string`** types are not intended for numeric data.
One byte numeric data should be stored using the **`byte`** data type.
All integer types are treated by the netCDF interface as signed.
It is possible to treat the **`byte`** type as unsigned by using the NUG
convention of indicating the unsigned range using the **`valid_min`**,
**`valid_max`**, or **`valid_range`** attributes.

Strings in variables may be represented one of two ways - as atomic strings or
as character arrays.
An n-dimensional array of strings may be implemented as a variable of type
**`string`** with n dimensions, or as a variable of type **`char`** with n+1
dimensions where the last (most rapidly varying) dimension is large enough to
contain the longest string in the variable.
For example, a character array variable of strings containing the names of the
months would be dimensioned (12,9) in order to accommodate "September", the
month with the longest name.
The other strings, such as "May", should be padded with trailing NULL or space
characters so that every array element is filled.
If the atomic string option is chosen, each element of the variable can be
assigned a string with a different length.
The CDL example below shows one variable of each type.

[[char-and-string-variables-ex]]
[caption="Example 1.1. "]
.String Variable Representations
====
----
dimensions:
strings = 30 ;
strlen = 10 ;
variables:
char char_variable(strings,strlen) ;
char_variable:long_name = "strings of type char" ;
string str_variable(strings) ;
str_variable:long_name = "strings of type string" ;
----
====

NetCDF does not support a character string type, so these must be represented as character arrays. In this document, a one dimensional array of character data is simply referred to as a "string". An n-dimensional array of strings must be implemented as a character array of dimension (n,max_string_length), with the last (most rapidly varying) dimension declared large enough to contain the longest string in the array. All the strings in a given array are therefore defined to be equal in length. For example, an array of strings containing the names of the months would be dimensioned (12,9) in order to accommodate "September", the month with the longest name.
The examples in this document that use string-valued variables alternate between
these two forms.



Expand Down
7 changes: 4 additions & 3 deletions ch05.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ in which the auxiliary coordinate variables appear in the
**`coordinates`** attribute string. The dimensions of an auxiliary
coordinate variable must be a subset of the dimensions of the variable
with which the coordinate is associated, with two exceptions. First,
string-valued coordinates (<<labels>>) have a dimension for maximum
string length. Second, in the ragged array representations of data
string-valued coordinates (<<labels>>) will have a dimension for maximum string
length if the coordinate variable has a type of **`char`** rather than a type
of **`string`**. Second, in the ragged array representations of data
(<<discrete-sampling-geometries>>), special methods are needed to
connect the data and coordinates

Expand Down Expand Up @@ -252,7 +253,7 @@ variables:
T:units = "K" ;
T:coordinates = "lon lat" ;
T:grid_mapping = "rotated_pole" ;
char rotated_pole
char rotated_pole ;
rotated_pole:grid_mapping_name = "rotated_latitude_longitude" ;
rotated_pole:grid_north_pole_latitude = 32.5 ;
rotated_pole:grid_north_pole_longitude = 170. ;
Expand Down
28 changes: 16 additions & 12 deletions ch06.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,21 @@ to indicate geographic regions.
Character strings labelling the elements of an axis are regarded as
string-valued auxiliary coordinate variables. The **`coordinates`**
attribute of the data variable names the variable that contains the
string array. An application processing the variables listed in the
**`coordinates`** attribute can recognize a string-valued auxiliary
coordinate variable because it contains an array of character data. The
inner dimension (last dimension in CDL terms) is the maximum length of
each string, and the other dimensions are axis dimensions. If a string-valued
auxiliary coordinate variable has only one dimension (the maximum length of the string),
it is a string-valued scalar coordinate variable (see <<scalar-coordinate-variables>>).
As such, it has the same information content and can be used in the same contexts as a
string-valued auxiliary coordinate variable of a size one dimension which has not been added
to the data variable. This is a convenience feature.
string array.
An application processing the variables listed in the **`coordinates`**
attribute can recognize a string-valued auxiliary coordinate variable because
it has a type of **`char`** or **`string`**.
If the variable has a type of **`char`**, the inner dimension (last dimension
in CDL terms) is the maximum length of each string, and the other dimensions
are axis dimensions.
If an auxiliary coordinate variable has a type of **`string`** and has no
dimensions, or has a type of **`char`** and has only one dimension (the maximum
length of the string), it is a string-valued scalar coordinate variable (see
<<scalar-coordinate-variables>>).
As such, it has the same information content and can be used in the same
contexts as a string-valued auxiliary coordinate variable of a size one
dimension.
This is a convenience feature.


[[geographic-regions, Section 6.1.1, "Geographic Regions"]]
Expand All @@ -52,7 +57,6 @@ dimensions:
times = 20 ;
lat = 5
lbl = 1 ;
strlen = 64 ;
variables:
float n_heat_transport(time,lat,lbl);
n_heat_transport:units="W";
Expand All @@ -64,7 +68,7 @@ variables:
float lat(lat) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
char geo_region(lbl,strlen) ;
string geo_region(lbl) ;
geo_region:standard_name="region"
data:
geo_region = "atlantic_ocean" ;
Expand Down
19 changes: 18 additions & 1 deletion ch07.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,24 @@ variables:
data:
land_sea="land","sea";
----
If the _method_ is `mean`, various ways of calculating the mean can be distinguished in the `cell_methods` attribute with a string of the form "`mean where` _type1_ [`over` _type2_]". Here, _type1_ can be any of the possibilities allowed for _typevar_ or _type_ (as specified in the two paragraphs preceding above Example). The same options apply to _type2_, except it is not allowed to be the name of an auxiliary coordinate variable with a dimension greater than one (ignoring the dimension accommodating the maximum string length). A `cell_methods` attribute with a string of the form "`mean where` _type1_ `over` _type2_" indicates the mean is calculated by summing over the _type1_ portion of the cell and dividing by the area of the _type2_ portion. In particular, a `cell_methods` string of the form "`mean where all_area_types over` _type2_" indicates the mean is calculated by summing over all types of area within the cell and dividing by the area of the _type2_ portion. (Note that "`all_area_types`" is one of the valid strings permitted for a variable with the `standard_name` `area_type`.) If "`over` _type2_" is omitted, the mean is calculated by summing over the _type1_ portion of the cell and dividing by the area of this portion.
If the _method_ is `mean`, various ways of calculating the mean can be
distinguished in the `cell_methods` attribute with a string of the form "mean
where _type1_ [over _type2_]".
Here, _type1_ can be any of the possibilities allowed for _typevar_ or _type_
(as specified in the two paragraphs preceding above Example).
The same options apply to _type2_, except it is not allowed to be the name of
an auxiliary coordinate variable with a dimension greater than one (ignoring
the possible dimension accommodating the maximum string length).
A `cell_methods` attribute with a string of the form "mean where _type1_
over _type2_" indicates the mean is calculated by summing over the _type1_
portion of the cell and dividing by the area of the _type2_ portion.
In particular, a `cell_methods` string of the form "mean where all_area_types
over _type2_" indicates the mean is calculated by summing over all types of
area within the cell and dividing by the area of the _type2_ portion.
(Note that `all_area_types` is one of the valid strings permitted for a
variable with the `standard_name` `area_type`.)
If "over _type2_" is omitted, the mean is calculated by summing over the
_type1_ portion of the cell and dividing by the area of this portion.
====

[[thickness-over-sea-area-ex]]
Expand Down
3 changes: 3 additions & 0 deletions history.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -203,3 +203,6 @@ node coordinate variables to be one of the dimensions of the data variable.

.17 July 2019
. #144 - Add <<groups, support for using groups>>.

.10 September 2019
. Issue #139: Added support for variables of type string.

0 comments on commit 2da5bad

Please sign in to comment.