Skip to content

Release Notes

bekozi edited this page May 18, 2020 · 12 revisions
  • Added support for three NetCDF metadata conventions. Parallel operations are supported for each format. UGRID and ESMF Unstructured Formats support multi-geometries (polygons sharing a unique identifier).
  • Added command line chunked regrid weight generation capability using ESMPy (ESMF’s Python interface). Chunked regrid weight generation uses a spatial decomposition to calculate regridding weights by breaking source and destination grids into smaller pieces (chunks). This allows arbitrarily high resolution grids to participate in regridding without depleting machine memory.
  • Added support for concurrent parallel NetCDF writes if netCDF4-python is built with parallel support (HDF5 and NetCDF libraries are also built with parallel support). This will result in considerable performance gains for IO-bound operations such as subsetting high resolution grid files.
  • Added a joint NetCDF-GIS output format linking a NetCDF file to a GIS data file. This allows saving the output of spatially aggregating operations on multiple geometries into a single NetCDF file following the CF discrete geometry conventions. See the NetCDF Output documentation for additional information on this new format. Code contributed by David Huard from Ouranos.
  • Added a freeze-thaw calculation defined by events where freezing and thawing occurs once a threshold of degree days is reached. The index is useful to characterize wear on infrastructures such as roads. Code contributed by David Huard from Ouranos.
  • Modified time dimension output from calculations to follow the input time dimension’s unlimited state. This maintains the same data structure as the input data. Output calculation files may then be concatenated along the unlimited dimension.
  • Added support for writing string variables to NetCDF files.
  • Added fancy index distributed slicing for integer and boolean slice bases. This allows variables to be sliced using exact indices (potentially nonmonotonic or unordered) in addition to sections. This is useful when selecting elements from unstructured grids where ordering does not follow a rectangular pattern.
  • Added new argument to request datasets called rotated_pole_priority for using spherical coordinates from rotated pole datasets if those coordinates are available. This avoids unnecessarily translating rotated coordinates.

Known Issues Fixed in this Release

  • The optional dependencies ESMPy (high-performance regridding) and ICCLIM (European Climate Assessment indices calculation) now support Python 3.

Known Issues

  • Testing revealed MPI code hangs with some MPI calls through “mpi4py” when using Python 3.5. It is recommended for users to use Python 2.7 or Python 3.6 for parallel operations.

Known Issues Carried Over

  • None
  • A change log detailing code changes in v2.x that may affect users migrating from v1.3.x may be found here: http://ocgis.readthedocs.io/en/v-2.0/changelog.html#version-2-x-backwards-compatibility.
  • Added support for MPI parallelization in operations. Primary OpenClimateGIS operations such as subsetting and calculations are now fully parallel resulting in significant performance improvement for large data manipulations. The optional dependency “mpi4py” is required to run OpenClimateGIS in parallel. An overview of OpenClimateGIS parallelism may be found here: http://ocgis.readthedocs.io/en/v-2.0/parallel.html.
  • Relaxed prior dimension constraint on operations where data variables were required to have a time dimension for spatial subsetting. Arbitrary dimension ordering and counts are now fully supported.
  • Improved variable spatial subsetting and slicing by “cascading” slices across all variables with shared dimensions. All variables in a dataset sharing a spatial dimension with the coordinate variables are now sliced and no longer selectively removed from output datasets. Masks also cascade similarly to slicing.
  • Added support for Python 3 (tested on versions 3.5 and 3.6) while maintaining backward compatibility with Python 2.7.
  • Improved coordinate system transformation performance by changing the underlying transformation utility from “gdal” to “pyproj”. “pyproj” is a new dependency.
  • Optimized subsetting to lower memory overhead and improve serial performance. In addition to parallelization performance boosts, the grid subsetting algorithm was refactored to take better advantage of hint masks and generators. Significantly less memory is required for subsetting operations by creating geometries on-demand.
  • Added option for optimized bounding box subsets. This new argument to operations bypasses geometry creation during subsetting in favor of logical comparisons with bounding box extents. This feature is useful for rectilinear bounding box subsets against regular, structured grids. A description of the new argument may be found here: http://ocgis.readthedocs.io/en/v-2.0/operations.html#optimized-bbox-subset.
  • Added option to subset with line geometries. With this addition, OpenClimateGIS now supports all simple geometry types (point, polygon, line) for subsetting.
  • Improved handling of subset spatial masks through the use of an independent masking variable. Spatial masking following an arbitrary geometry subset was indistinguishable from data masks existing prior to a subset. To address this, OpenClimateGIS now appends a subset spatial mask variable to netCDF subsets to persist (and potentially reuse) the spatial mask: http://ocgis.readthedocs.io/en/v-2.0/appendix.html#spatial-masking.
  • Added option to spatially reorder spherical coordinates following a wrap operation. This ensures data is ascendingly ordered from -180 to 180 degrees longitude: http://ocgis.readthedocs.io/en/v-2.0/operations.html#spatial-reorder.
  • Added option to wrap (-180 to 180 longitude) or unwrap (0 to 360 longitude) for spherical coordinate systems. A new option to operations allows users to control spatial wrapping for gridded datasets: http://ocgis.readthedocs.io/en/v-2.0/operations.html#spatial-wrapping.
  • Allowed arbitrary ordering of geometry selection identifiers.

Known Issues Fixed in this Release

  • NARCCAP 2-d coordinate variables are not subsetted and properly written to NetCDF output. Currently, only row and column coordinate representations are enabled for read/write to NetCDF (https://github.com/NCPP/ocgis/issues/143). Shared dimensions are now subset.

Known Issues

  • ESMPy and ICCLIM do not support Python 3. ESMPy supports Python 3 if installed from source. Guidance on Python versions and OpenClimateGIS may be found here: http://ocgis.readthedocs.io/en/v-2.0/install.html#supported-python-versions.
  • Testing revealed undefined behavior with some MPI calls through “mpi4py” when using Python 3.5. It is recommended for users to use Python 2.7 or Python 3.6 for parallel operations.

Known Issues Carried Over

Known Issues Fixed in this Release

  • Queue exception is being masked by the logging engine in an MPI environment with a single thread (https://github.com/NCPP/ocgis/issues/227). Exception could not be reproduced.
  • If data input types are different between request datasets (i.e. integer and float), ESRI Shapefile conversion fails. An appropriate exception or casting of data types must be enabled for this to work properly (https://github.com/NCPP/ocgis/issues/215). Issue fixed with newer dependency versions.

Known Issues Carried Over

Know Issues Fixed in this Release

Known Issues Carried Over

  • Updated regridding operations to use ESMPy v7.0.0. ESMPy is the Python interface to the Earth System Modeling Framework (ESMF) regridding. To learn more about ESMPy please visit: http://www.earthsystemmodeling.org/esmf_releases/non_public/ESMF_7_0_0/esmpy_doc/html/intro.html. ESMPy v6.3.0rp1 is no longer supported in OpenClimateGIS.
  • Added Mac OSX-64 and Windows-64 Python packages to the Integrated Ocean Observing System (IOOS) Anaconda channel. An “Anaconda channel” is a collection of Anaconda packages managed by an organization. The IOOS organization provides a robust continuous integration system for building cross-platform scientific softwares. Installation documentation for new IOOS Anaconda packages is located at: http://ocgis.readthedocs.org/en/latest/install.html#anaconda-package.
  • Migrated documentation to ReadTheDocs: http://ocgis.readthedocs.org/en/latest/. ReadTheDocs is a widely used Python documentation build system. It creates professional and version-specific documentation.
  • Added support for an additional unit conversion backend called “cf_units”: https://github.com/SciTools/cf_units. “cf_units” supports Mac OSX and Python 3.x versions out of the box and has an active development and user community.
  • Added a new regrid option called “split” which supports bulk regridding operations along undistributed dimensions (i.e. time). The default option of one regrid operation per time coordinate remains the default. This option was added to allow users to customize the trade-offs between memory usage and performance. See: http://ocgis.readthedocs.org/en/latest/functions.html#ocgis.regrid.base.iter_esmf_fields.
  • Removed regridding constraints on source and destination spatial extents by taking advantage of ESMPy’s unmapped points management. This now allows partially overlapping spatial extents to be used in OpenClimateGIS regridding operations.
  • Added a new operations argument called “output_format_options”, which allows format-specific customizations for netCDF outputs. This includes converting unlimited dimensions to fixed size, enabling variable-level compression, and control of the output netCDF data model. See: http://ocgis.readthedocs.org/en/v1.3.0/api.html#output-format-options.
  • Updated ICCLIM, a Python library for the calculation of European Climate Indices, to use version 4.1.1. User-defined indices in ICCLIM are not supported.
  • Modified netCDF variable scanning when retrieving coordinate system information. Previously, data variables were required to have a “grid_mapping” attribute. Now, if a “grid_mapping_name” attribute is found on a scanned netCDF variable it is used as the dataset’s coordinate system.

Known Issues Fixed in this Release

Known Issues

Known Issues Carried Over

  • Created an Anaconda package including all optional dependencies. Currently, only Linux-64 packages are available. OSX-64 builds are in development and will be posted as soon as they are ready. One of the optional dependencies, ESMPy (a Python interface to ESMF regridding) must be installed separately because of an issue with the Anaconda package solver. http://ocgis.readthedocs.org/en/v1.2.1/install.html#anaconda-installation
  • Fixed an issue with the GDAL library not finding datum files for PROJ.4 coordinate systems. Previously, some users needed to set an environment variable (GDAL_DATA) pointing to the proper directory. Now, the software automatically sets the variable if data files are missing at runtime and issues a warning if the software automatically configures the directory location.
  • Added an optional key to the calculation definition dictionary allowing users to add global (i.e. dataset) attributes to output netCDF files. http://ocgis.readthedocs.org/en/v1.2.1/computation.html#using-computations
  • Added the ability to use shapefiles without the unique identifier -- “UGID” -- attribute. If the input shapefile contains a unique identifier, it may be used with the new option "geom_uid". If no unique identifier is selected, one will be created. The “select_ugid” option was also renamed to "geom_select_uid". The old "select_ugid" parameter name will still work. http://ocgis.readthedocs.org/en/v1.2.1/api.html#id4
  • Added "geom_select_sql_where" to operations. This allows a string representing the "WHERE" clause of a SQL statement to be used for selecting geometries from source files based on attribute constraints. This streamlines geospatial file access. http://ocgis.readthedocs.org/en/v1.2.1/api.html#geom-select-sql-where
  • Added a new parameter to operations called "time_subset_func" for passing arbitrary Python predicate functions to the time subsetter. This allows custom time period filtering. http://ocgis.readthedocs.org/en/v1.2.1/api.html#time-subset-func

Known Issues Fixed in this Release

  • Resolved the issue of multipoint geometry unions producing unexpected behavior in ESRI Shapefile and NetCDF outputs. The buffering of multipoint geometries for subsetting is not happening appropriately. The individual points should be buffered before the union (https://github.com/NCPP/ocgis/issues/303). This issue could not be reproduced in the current version.
  • Fixed the issue with overloaded values from RequestDataset are not being reflected in inspect output. Inspect output prints file metadata in addition to computed information such as spatiotemporal extent and resolution (https://github.com/NCPP/ocgis/issues/230). Inspection was refactored to allow a common implementation across drivers. Inspection output for field dimensions is now a method on the dimension itself with special cases for time and space.

Known Issues

  • None

Known Issues Carried Over

  • Added support for European Climate Assessment and Dataset indices and indicators using the ICCLIM (Index Calculation CLIMate) Python library. ICCLIM calculations fully integrate with other OpenClimateGIS operations (i.e. subsetting, format conversion). ICCLIM is used by OpenClimateGIS’s European partners for the analysis of climate data. Documentation on how to call ICCLIM functions from OpenClimateGIS is found here.
  • Added a new, default, non-melted tabular output format structure. The previous file structure is still available and may be enabled via a new operations argument or a new environment variable. This change occurred to support input formats such as ESRI Shapefile where a melted format unnecessarily increases the output file size. The primary difference between a melted and non-melted output format is that variables (i.e. temperature, county name) in a non-melted format occur as individual columns. This change is potentially backward incompatible.
  • Upgraded Python setup and installation library to use the third-party setuptools library from distutils. Setuptools extends and improves the standard Python installer package and is widely used by the Python community. Setuptools is not part of the standard Python distribution, and it will need to be installed separately. This change is potentially backward incompatible.
  • Upgraded required Fiona version to 1.4.5. There were minor changes to Fiona that may cause issues. Upgrading this dependency is strongly encouraged. This change is potentially backward incompatible.
  • Added the ability to conform (convert) time units (i.e. “days since 2000-1-1”) for request datasets. The calendar (i.e. “gregorian”) may not be changed. This is useful for creating collections of data with the same time units/origins.
  • Changed the default name for the joint CSV-ESRI Shapefile output from “csv+” to “csv-shp”. Note the old “csv+” key will still work. The old “csv+” key was cryptic and needed to change.
  • Added a new command to the setup routine (“test”), which executes a subset of the larger test suite.
  • Added a utility function that processes a shapefile to add a unique identifier. Shapefile unique identifiers are required by OpenClimateGIS when reading geometry data for subsetting.

Known Issues Fixed in this Release

  • Missing documentation for ShpProcess. The ShpProcess object prepares ESRI Shapefiles for use in subsetting by adding a unique identifier and checking for valid geometries (https://github.com/NCPP/ocgis/issues/248). This was fixed by adding a utility function that processes a shapefile circumventing the need to use ShpProcess.
  • Import errors during setup should pass along the original error message. This is important for import errors related to linked libraries where the error message will help with debugging (https://github.com/NCPP/ocgis/issues/295). This was fixed through the transition of setup routines from distutils to setuptools which will automatically signal the user if a dependency is missing.

Known Issues

  • None

Known Issues Carried Over

  • Multipoint geometry unions produce unexpected behavior in ESRI Shapefile and NetCDF outputs. The buffering of multipoint geometries for subsetting is not happening appropriately. The individual points should be buffered before the union. (https://github.com/NCPP/ocgis/issues/303)
  • Queue exception is being masked by the logging engine in an MPI environment with a single thread. (https://github.com/NCPP/ocgis/issues/227)
  • Overloaded values from RequestDataset are not being reflected in inspect output. Inspect output prints file metadata in addition to computed information such as spatiotemporal extent and resolution. (https://github.com/NCPP/ocgis/issues/230)
  • If data input types are different between request datasets (i.e. integer and float), ESRI Shapefile conversion fails. An appropriate exception or casting of data types must be enabled for this to work properly. (https://github.com/NCPP/ocgis/issues/215)
  • NARCCAP 2-d coordinate variables are not subsetted and properly written to NetCDF output. Currently, only row and column coordinate representations are enabled for read/write to NetCDF. (https://github.com/NCPP/ocgis/issues/143)
  • Masked values are included in sample size calculations. Masked values are not used in calculations and should not be counted as part of the sample size for the input value set. (https://github.com/NCPP/ocgis/issues/142)
  • Fixed interpolation issue over poles (https://github.com/NCPP/ocgis/issues/335). Latitude and longitude index locations were incorrect on ESMPy Grid objects triggering pole extrapolation by ESMF. The index locations are now set correctly.

Known Issues Fixed in this Release

Known Issues

  • None

Known Issues Carried Over

  • Added bilinear and first order conservative regridding support for spherical coordinate systems using ESMPy (https://www.earthsystemcog.org/projects/esmpy/). An overview of regridding in OpenClimateGIS may be found here: http://ncpp.github.io/ocgis/regrid.html. Regridding options are available on OcgOperations (http://ncpp.github.io/ocgis/api.html#regrid-destination) and RequestDataset (http://ncpp.github.io/ocgis/api.html#dataset). Regridding of multiple input files to a common grid is important for intercomparison projects, visualization, and other analysis activities.
  • Added a helper function to sort a list of file locations by their time dimension. Multi-file concatenation in netCDF4-python does not sort input files by time extents. This function may be used if the temporal ordering of the files is not known a priori and data in multiple files is needed to be accessed as a single, virtual file.
  • Added standard and long names by default to all calculations in the codebase. Arbitrary metadata attributes or overloading of the default standard/long names may accomplished via a new ‘meta_attrs’ key in the calculation parameter dictionary. This continues OpenClimateGIS’s support for NetCDF Climate and Forecast Convention.
  • Added moving window summary statistics to the calculation library. These include the commonly used mean, median, minimum, maximum, variance, and standard deviation metrics.
  • Added a temporal sum function to calculation library. Summation is another common metric used for temporal aggregations (e.g. daily data summing to monthly data).
  • Modified the internal spatial operations to now rely on SpatialDimension objects for simplifying the manipulation (i.e. wrapping, coordinate system transformations) of the spatial objects in the subset workflow.

Known Issues Fixed in this Release

  • None

Known Issues

Known Issues Carried Over

  • Added support for scientific unit transformations using “cfunits-python” (https://code.google.com/p/cfunits-python/). This means that output datasets can now be on different units from the source datasets. This package is optional. If the package is installed, options to conform units are available at the request or operations level (http://ncpp.github.io/ocgis/api.html#conform-units-to).
  • Improved the performance of complicated spatial operations by using the Python package “rtree” (http://toblerity.org/rtree/) to construct a runtime spatial index. Installation of this new package is optional.
  • Added seasonal aggregations as an additional temporal grouping method for calculations (http://ncpp.github.io/ocgis/api.html#calc-grouping).
  • Multiple variables may now be requested from a single target (i.e. OPeNDAP URL) without constructing multiple request objects (http://ncpp.github.io/ocgis/api.html#dataset).
  • OpenClimateGIS will now discover subsettable variables within a file or suite of files and automatically configure the data request if no variable is specified by the user (http://ncpp.github.io/ocgis/api.html#dataset).
  • Added a small change to the geometry selection from an ESRI Shapefile. Users may now use a file path to an input Shapefile in addition to the key selection method used previously.
  • For NetCDF output, a string representation of OpenClimateGIS operations is now appended to the global history attribute on output files.
  • Added an option to spatially select the nearest data point from a point selection geometry (http://ncpp.github.io/ocgis/api.html#select-nearest).
  • In previous releases, time subsetting could only be configured at the request dataset level. Time subsetting arguments (i.e. time range and time region) have also been made available at the operations level. This enables users to specify a time subset across multiple request datasets with one argument (http://ncpp.github.io/ocgis/api.html#time-range).

Known Issues Fixed in this Release

Known Issues

  • None

Known Issues Carried Over

  • Added a new calculation grouping that summarizes over the entire subsetted time range: http://ncpp.github.io/ocgis/api.html#calc-grouping.
  • Modified the setup routine so that it now understands the "--prefix" argument allowing installation to a custom location.
  • Added a new option to interpolate spatial bounds if they are not present in the source dataset.
  • Joshua Sims (University of Michigan-Ann Arbor, GLISAclimate) contributed a script using OpenClimateGIS in an MPI environment.
  • Added a NEW DEPENDENCY: Fiona (https://pypi.python.org/pypi/Fiona) for conversion to OGR data formats (i.e. ESRI Shapefile). Fiona may be installed via “pip” or “easy_install”.
  • Added a new output format GeoJSON using Fiona (http://ncpp.github.io/ocgis/api.html#output-format).
  • Introduced a modified NumPy interface accommodating 5-dimensional data representations (i.e. ensembles) and simplified inheritance structure for introducing additional source datasets (http://ncpp.github.io/ocgis/api.html#data-collections).
  • Made improvements to projection handling for spatially referenced netCDF files to accommodate Fiona coordinate system representations.
  • Changed “CALC_NAME” to “CALC_KEY” in all tabular output formats.
  • The variable alias (i.e. alias keyword argument to RequestDataset) is now appended to the end of all calculation names. This was done to allow computations to be applied to multiple request datasets without the need to generate unique calculation names.
  • Sample size is no longer calculated by default. It may be enabled via the "calc_sample_size" argument to OcgOperations.
  • Changed how points are used as a selection geometry. A new OcgOperations argument "search_radius_mult" (http://ncpp.github.io/ocgis/api.html#search-radius-mult) is used to buffer the point prior to selecting data. In past releases, the nearest data point was chosen.

Known Issues Fixed in this Release

  • Outdated documentation for ShpCabinet. (Documentation updated).
  • Resolution and extent properties of NcGridMatrixDimension fails. (NcGridMatrixDimension removed as of this release).
  • Inspect output should report calendar as 'None' if it is not set. Calendar was being reported as 'standard' which masked the true state of the file metadata. (https://github.com/NCPP/ocgis/issues/224).
  • Default calendar should be 'standard'. An exception occurs if the attribute is not available in file metadata. (https://github.com/NCPP/ocgis/issues/223).

Known Issues

  • None

Known Issues Carried Over

  • Added SQL access to ESRI Shapefile selection for lower read delays on large shapefiles. This utilizes the SQL query capability of OGR negating the need to loop over the entire shapefile contents.
  • Added a ShpProcess object to assist in shapefile conversion (requires Fiona). Shapefiles now require a unique, integer UGID field. The UGID requirement was added for consistency within the software, to avoid miscellaneous configuration, and ease attribute joining to original shapefiles.
  • Included header dump of netCDF file attributes in disk outputs (*_source_metdata.txt). This is functionally equivalent to the output provided by “ncdump -h ”.
  • Added the following attributes to *_did.csv: standard name, long name, units.
  • Added "headers" parameters to OcgOperations to allow selection of file headers. This allows a user to limit the columns in an ASCII output.
  • Added "time_region" to RequestDataset to subset by arbitrary month/year combinations in addition to the “time_range” parameter providing lower and upper subset bounds.

Known Issues Fixed in this Release

  • None

Known Issues

  • None

Known Issues Carried Over