CSHS-CWRA · Zeitsperre · Oct 2, 2023 · Sep 21, 2023 · Sep 22, 2023 · Sep 22, 2023
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -4,8 +4,7 @@
 Contributing
 ============
 
-Contributions are welcome, and they are greatly appreciated! Every little bit
-helps, and credit will always be given.
+Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
 
 You can contribute in many ways:
 
@@ -26,21 +25,17 @@ If you are reporting a bug, please include:
 Fix Bugs
 ~~~~~~~~
 
-Look through the GitHub issues for bugs. Anything tagged with "bug" and "help
-wanted" is open to whoever wants to implement it.
+Look through the GitHub issues for bugs. Anything tagged with "bug" and "help wanted" is open to whoever wants to implement it.
 
 Implement Features
 ~~~~~~~~~~~~~~~~~~
 
-Look through the GitHub issues for features. Anything tagged with "enhancement"
-and "help wanted" is open to whoever wants to implement it.
+Look through the GitHub issues for features. Anything tagged with "enhancement" and "help wanted" is open to whoever wants to implement it.
 
 Write Documentation
 ~~~~~~~~~~~~~~~~~~~
 
-RavenPy could always use more documentation, whether as part of the
-official RavenPy docs, in docstrings, or even on the web in blog posts,
-articles, and such.
+RavenPy could always use more documentation, whether as part of the official RavenPy docs, in docstrings, or even on the web in blog posts, articles, and such.
 
 Submit Feedback
 ~~~~~~~~~~~~~~~
@@ -90,7 +85,7 @@ Ready to contribute? Here's how to set up `ravenpy` for local development.
 
     $ flake8 ravenpy tests
     $ black --check ravenpy tests
-    $ python setup.py test  # or `pytest`
+    $ pytest tests
     $ tox
 
    To get flake8, black, and tox, just pip install them into your virtualenv.

diff --git a/HISTORY.rst b/HISTORY.rst
@@ -4,7 +4,15 @@ History
 
 0.12.4 (unreleased)
 -------------------
-* In tests, set xclim' missing value option to ``skip``. As of xclim 0.45, missing value checks are applied to ``fit`` indicator, meaning that parameters will be set to None if missing values are found in the fitted time series. Wrap calls to ``fit`` with ``xclim.set_options(check_missing="skip")`` to reproduce the previous behavior of xclim.
+
+Breaking changes
+^^^^^^^^^^^^^^^^
+* In tests, set `xclim`'s missing value option to ``skip``. As of `xclim` v0.45, missing value checks are applied to the ``fit`` indicator, meaning that parameters will be set to `None` if missing values are found in the fitted time series. Wrap calls to ``fit`` with ``xclim.set_options(check_missing="skip")`` to reproduce the previous behavior of xclim.
+* `RavenPy` processes and tests that depend on remote THREDDS/GeoServer now allow for optional server URL and file location targets. These can be set with the following environment variables:
+    * `RAVENPY_THREDDS_URL`: URL to the THREDDS-hosted climate data service. Defaults to `https://pavics.ouranos.ca/twitcher/ows/proxy/thredds`.
+    * `RAVENPY_GEOSERVER_URL`: URL to the GeoServer-hosted vector/raster data. Defaults to `https://pavics.ouranos.ca/geoserver`.
+         * This environment variable was previously called `GEO_URL` and was renamed to narrow its scope to RavenPy.
+* The `_determine_upstream_ids` function under `ravenpy.utilities.geoserver` has been removed as it was a duplicate of `ravenpy.utilities.geo.determine_upstream_ids`. The latter function is now used in its place.
 
 0.12.3 (2023-08-25)
 -------------------

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -5,8 +5,7 @@ Installation
 Anaconda Python Installation
 ----------------------------
 
-For many reasons, we recommend using a `Conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_
-to work with the full RavenPy installation. This implementation is able to manage the harder-to-install GIS dependencies, like `GDAL`.
+For many reasons, we recommend using a `Conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ to work with the full RavenPy installation. This implementation is able to manage the harder-to-install GIS dependencies, like `GDAL`.
 
 Begin by creating an environment:
 
@@ -26,8 +25,7 @@ RavenPy can then be installed directly via its `conda-forge` package by running:
 
    (ravenpy) $ conda install -c conda-forge ravenpy
 
-This approach installs the `Raven <http://raven.uwaterloo.ca>`_ binary directly to your environment `PATH`,
-as well as installs all the necessary Python and C libraries supporting GIS functionalities.
+This approach installs the `Raven <http://raven.uwaterloo.ca>`_ binary directly to your environment `PATH`, as well as installs all the necessary Python and C libraries supporting GIS functionalities.
 
 Python Installation (pip)
 -------------------------
@@ -71,10 +69,22 @@ Once downloaded/compiled, the binary can be pointed to manually (as an absolute
 
    $ export RAVENPY_RAVEN_BINARY_PATH=/path/to/my/custom/raven
 
+Customizing remote service datasets
+-----------------------------------
+
+A number of functions and tests within `RavenPy` are dependent on remote services (THREDDS, GeoServer) for providing climate datasets, hydrological boundaries, and other data. These services are provided by `Ouranos <https://www.ouranos.ca>`_ through the `PAVICS <https://pavics.ouranos.ca>`_ project and may be subject to change in the future.
+
+If for some reason you wish to use alternate services, you can set the following environment variables to point to your own instances of THREDDS and GeoServer:
+
+.. code-block:: console
+
+   $ export RAVENPY_THREDDS_URL=https://my.domain.org/thredds
+   $ export RAVENPY_GEOSERVER_URL=https://my.domain.org/geoserver
+
 Development Installation (from sources)
 ---------------------------------------
 
-The sources for RavenPy can be obtained from the GitHub repo:
+The sources for `RavenPy` can be obtained from the GitHub repo:
 
 .. code-block:: console
 

diff --git a/ravenpy/extractors/forecasts.py b/ravenpy/extractors/forecasts.py
@@ -1,8 +1,10 @@
 import datetime as dt
 import logging
+import os
 import re
 from pathlib import Path
 from typing import Any, List, Tuple, Union
+from urllib.parse import urljoin
 
 import pandas as pd
 import xarray as xr
@@ -19,6 +21,12 @@
 
 LOGGER = logging.getLogger("PYWPS")
 
+# Do not remove the trailing / otherwise `urljoin` will remove the geoserver path.
+# Can be set at runtime with `$ env RAVENPY_THREDDS_URL=https://xx.yy.zz/geoserver/ ...`.
+THREDDS_URL = os.environ.get(
+    "RAVENPY_THREDDS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
+)
+
 
 def get_hindcast_day(region_coll: fiona.Collection, date, climate_model="GEPS"):
     """Generate a forecast dataset that can be used to run raven.
@@ -38,15 +46,36 @@ def get_hindcast_day(region_coll: fiona.Collection, date, climate_model="GEPS"):
 
 
 def get_CASPAR_dataset(
-    climate_model: str, date: dt.datetime
+    climate_model: str,
+    date: dt.datetime,
+    thredds: str = THREDDS_URL,
+    directory: str = "dodsC/birdhouse/disk2/caspar/daily/",
 ) -> Tuple[
     xr.Dataset, List[Union[Union[DatetimeIndex, Series, Timestamp, Timestamp], Any]]
 ]:
-    """Return CASPAR dataset."""
+    """Return CASPAR dataset.
+
+    Parameters
+    ----------
+    climate_model : str
+        Type of climate model, for now only "GEPS" is supported.
+    date : dt.datetime
+        The date of the forecast.
+    thredds : str
+        The thredds server url. Default: "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
+    directory : str
+        The directory on the thredds server where the data is stored. Default: "dodsC/birdhouse/disk2/caspar/daily/"
+
+    Returns
+    -------
+    xr.Dataset
+        The forecast dataset.
+    """
 
     if climate_model == "GEPS":
         d = dt.datetime.strftime(date, "%Y%m%d")
-        file_url = f"https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/disk2/caspar/daily/GEPS_{d}.nc"
+        file_location = urljoin(directory, f"GEPS_{d}.nc")
+        file_url = urljoin(thredds, file_location)
         ds = xr.open_dataset(file_url)
         # Here we also extract the times at 6-hour intervals as Raven must have
         # constant timesteps and GEPS goes to 6 hours
@@ -66,14 +95,31 @@ def get_CASPAR_dataset(
 
 def get_ECCC_dataset(
     climate_model: str,
+    thredds: str = THREDDS_URL,
+    directory: str = "dodsC/datasets/forecasts/eccc_geps/",
 ) -> Tuple[
     Dataset, List[Union[Union[DatetimeIndex, Series, Timestamp, Timestamp], Any]]
 ]:
-    """Return latest GEPS forecast dataset."""
+    """Return latest GEPS forecast dataset.
+
+    Parameters
+    ----------
+    climate_model : str
+        Type of climate model, for now only "GEPS" is supported.
+    thredds : str
+        The thredds server url. Default: "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
+    directory : str
+        The directory on the thredds server where the data is stored. Default: "dodsC/datasets/forecasts/eccc_geps/"
+
+    Returns
+    -------
+    xr.Dataset
+        The forecast dataset.
+    """
     if climate_model == "GEPS":
         # Eventually the file will find a permanent home, until then let's use the test folder.
-        file_url = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/forecasts/eccc_geps/GEPS_latest.ncml"
-
+        file_location = urljoin(directory, "GEPS_latest.ncml")
+        file_url = urljoin(thredds, file_location)
         ds = xr.open_dataset(file_url)
         # Here we also extract the times at 6-hour intervals as Raven must have
         # constant timesteps and GEPS goes to 6 hours
@@ -130,9 +176,10 @@ def get_subsetted_forecast(
     times: Union[dt.datetime, xr.DataArray],
     is_caspar: bool,
 ) -> xr.Dataset:
-    """
+    """Get Subsetted Forecast.
+
     This function takes a dataset, a region and the time sampling array and returns
-    the subsetted values for the given region and times
+    the subsetted values for the given region and times.
 
     Parameters
     ----------
@@ -143,14 +190,12 @@ def get_subsetted_forecast(
     times : dt.datetime or xr.DataArray
         The array of times required to do the forecast.
     is_caspar : bool
-        True if the data comes from Caspar, false otherwise.
-        Used to define lat/lon on rotated grid.
+        True if the data comes from Caspar, false otherwise. Used to define lat/lon on rotated grid.
 
     Returns
     -------
     xr.Dataset
         The forecast dataset.
-
     """
     # Extract the bounding box to subset the entire forecast grid to something
     # more manageable

diff --git a/ravenpy/utilities/forecasting.py b/ravenpy/utilities/forecasting.py
@@ -6,9 +6,11 @@
 """
 import datetime as dt
 import logging
+import os
 import tempfile
 from pathlib import Path
 from typing import List, Union
+from urllib.parse import urlparse
 
 import climpred
 import xarray as xr
@@ -20,6 +22,12 @@
 
 LOGGER = logging.getLogger("PYWPS")
 
+# Do not remove the trailing / otherwise `urljoin` will remove the geoserver path.
+# Can be set at runtime with `$ env RAVENPY_THREDDS_URL=https://xx.yy.zz/thredds/ ...`.
+THREDDS_URL = os.environ.get(
+    "RAVENPY_THREDDS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
+)
+
 
 def climatology_esp(
     config,
@@ -391,9 +399,10 @@ def ensemble_prediction(
 hindcast_from_meteo_forecast = ensemble_prediction
 
 
-def compute_forecast_flood_risk(forecast: xr.Dataset, flood_level: float):
-    """Returns the empirical exceedance probability for each forecast day based
-    on a flood level threshold.
+def compute_forecast_flood_risk(
+    forecast: xr.Dataset, flood_level: float, thredds: str = THREDDS_URL
+) -> xr.Dataset:
+    """Returns the empirical exceedance probability for each forecast day based on a flood level threshold.
 
     Parameters
     ----------
@@ -402,6 +411,8 @@ def compute_forecast_flood_risk(forecast: xr.Dataset, flood_level: float):
     flood_level : float
         Flood level threshold. Will be used to determine if forecasts exceed
         this specified flood threshold. Should be in the same units as the forecasted streamflow.
+    thredds : str
+        The thredds server url. Default: "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds"
 
     Returns
     -------
@@ -429,12 +440,13 @@ def compute_forecast_flood_risk(forecast: xr.Dataset, flood_level: float):
             forecast.where(forecast > flood_level).notnull() / 1.0
         )  # This is needed to return values instead of floats
 
+    domain = urlparse(thredds).netloc
+
     out = pct.to_dataset(name="exceedance_probability")
-    out.attrs["source"] = "PAVICS-Hydro flood risk forecasting tool, pavics.ouranos.ca"
+    out.attrs["source"] = f"PAVICS-Hydro flood risk forecasting tool, {domain}"
     out.attrs["history"] = (
-        "File created on "
-        + dt.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
-        + "UTC on the PAVICS-Hydro service available at pavics.ouranos.ca"
+        f"File created on {dt.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')} "
+        f"UTC on the PAVICS-Hydro service available at {domain}."
     )
     out.attrs[
         "title"