Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump kedro-datasets to 2.0.0 #3405

Merged
merged 10 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 7 additions & 20 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,13 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: "3.8"
python: "3.9"
nodejs: "19"
apt_packages:
- libasound2
jobs:
post_create_environment:
- npm install -g @mermaid-js/mermaid-cli
pre_install:
# pip==23.2 breaks pip-tools<7.0, and pip-tools>=7.0 does not support Python 3.7
# pip==23.3 breaks dependency resolution
- python -m pip install -U "pip>=21.2,<23.2"
# These are technically installation steps, due to RTD's limit we need to inject the installation earlier.
- python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
- python -m pip install --upgrade --upgrade-strategy only-if-needed --no-cache-dir .[docs,test]
pre_build:
- pip freeze
- python -m sphinx -WETan -j auto -D language=en -b linkcheck -d _build/doctrees docs/source _build/linkcheck
Expand All @@ -32,15 +25,9 @@ sphinx:
configuration: docs/source/conf.py
fail_on_warning: true

# Build documentation with MkDocs
# mkdocs:
# configuration: mkdocs.yml

# Optionally set the version of Python and requirements required to build your docs
# python:
# install:
# - method: pip
# path: .
# extra_requirements:
# - docs
# - test
python:
install:
- method: pip
path: .
extra_requirements:
- docs
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* Removed `pip-tools` as a dependency.
* Accepted path-like filepaths more broadly for datasets.
* Removed support for defining the `layer` attribute at top-level within DataCatalog.
* Bumped `kedro-datasets` to latest `2.0.0`.

## Breaking changes to the API
* Renamed the `data_sets` argument and the `_data_sets` attribute in `Catalog` and their references to `datasets` and `_datasets` respectively.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@

intersphinx_mapping = {
"kedro-viz": ("https://docs.kedro.org/projects/kedro-viz/en/v6.6.1/", None),
"kedro-datasets": ("https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-1.8.0/", None),
"kedro-datasets": ("https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-2.0.0/", None),
}

# The suffix(es) of source filenames.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/advanced_data_catalog_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ From version **`2.0.0`** of `kedro-datasets`, all dataset names have changed to

To use the `DataCatalog` API, construct a `DataCatalog` object programmatically in a file like `catalog.py`.

In the following code, we use several pre-built data loaders documented in the {doc}`kedro-datasets documentation<kedro-datasets:kedro_datasets>`.
In the following code, we use several pre-built data loaders documented in the {py:mod}`kedro-datasets documentation <kedro-datasets:kedro_datasets>`.

```python
from kedro.io import DataCatalog
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ shuttles:

Kedro supports a range of connectors, for CSV files, Excel spreadsheets, Parquet files, Feather files, HDF5 files, JSON documents, pickled objects, SQL tables, SQL queries, and more. They are supported using libraries such as pandas, PySpark, NetworkX, and Matplotlib.

{doc}`The kedro-datasets package documentation<kedro-datasets:kedro_datasets>` contains a comprehensive list of all available file types.
{py:mod}`The kedro-datasets package documentation <kedro-datasets:kedro_datasets>` contains a comprehensive list of all available file types.

### Dataset `filepath`

Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/how_to_create_a_custom_dataset.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Advanced: Tutorial to create a custom dataset

{doc}`Kedro supports many datasets<kedro-datasets:kedro_datasets>` out of the box, but you may find that you need to create a custom dataset. For example, you may need to handle a proprietary data format or filesystem in your pipeline, or perhaps you have found a particular use case for a dataset that Kedro does not support. This tutorial explains how to create a custom dataset to read and save image data.
{py:mod}`Kedro supports many datasets <kedro-datasets:kedro_datasets>` out of the box, but you may find that you need to create a custom dataset. For example, you may need to handle a proprietary data format or filesystem in your pipeline, or perhaps you have found a particular use case for a dataset that Kedro does not support. This tutorial explains how to create a custom dataset to read and save image data.

Check notice on line 3 in docs/source/data/how_to_create_a_custom_dataset.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/data/how_to_create_a_custom_dataset.md#L3

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/data/how_to_create_a_custom_dataset.md", "range": {"start": {"line": 3, "column": 147}}}, "severity": "INFO"}

## AbstractDataset

Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

In a Kedro project, the Data Catalog is a registry of all data sources available for use by the project. The catalog is stored in a YAML file (`catalog.yml`) that maps the names of node inputs and outputs as keys in the `DataCatalog` class.

The {doc}`kedro-datasets<kedro-datasets:kedro_datasets>` package offers built-in datasets for common file types and file systems.
The {py:mod}`kedro-datasets <kedro-datasets:kedro_datasets>` package offers built-in datasets for common file types and file systems.

We first introduce the basic sections of `catalog.yml`, which is the file used to register data sources for a Kedro project.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/extend_kedro/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@

## Use Case 1: How to add extra behaviour to Kedro's execution timeline

The execution timeline of a Kedro pipeline can be thought of as a sequence of actions performed by various Kedro library components, such as the {doc}`datasets<kedro-datasets:kedro_datasets>`, [DataCatalog](/kedro.io.DataCatalog), [Pipeline](/kedro.pipeline.Pipeline), [Node](/kedro.pipeline.node.Node) and [KedroContext](/kedro.framework.context.KedroContext).
The execution timeline of a Kedro pipeline can be thought of as a sequence of actions performed by various Kedro library components, such as the {py:mod}`datasets <kedro-datasets:kedro_datasets>`, [DataCatalog](/kedro.io.DataCatalog), [Pipeline](/kedro.pipeline.Pipeline), [Node](/kedro.pipeline.node.Node) and [KedroContext](/kedro.framework.context.KedroContext).

Check notice on line 7 in docs/source/extend_kedro/common_use_cases.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/extend_kedro/common_use_cases.md#L7

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/extend_kedro/common_use_cases.md", "range": {"start": {"line": 7, "column": 1}}}, "severity": "INFO"}

At different points in the lifecycle of these components, you might want to add extra behaviour: for example, you could add extra computation for profiling purposes _before_ and _after_ a node runs, or _before_ and _after_ the I/O actions of a dataset, namely the `load` and `save` actions.

This can now achieved by using [Hooks](../hooks/introduction.md), to define the extra behaviour and when in the execution timeline it should be introduced.

## Use Case 2: How to integrate Kedro with additional data sources

You can use {doc}`datasets<kedro-datasets:kedro_datasets>` to interface with various different data sources. If the data source you plan to use is not supported out of the box by Kedro, you can [create a custom dataset](../data/how_to_create_a_custom_dataset.md).
You can use {py:mod}`datasets <kedro-datasets:kedro_datasets>` to interface with various different data sources. If the data source you plan to use is not supported out of the box by Kedro, you can [create a custom dataset](../data/how_to_create_a_custom_dataset.md).

Check warning on line 15 in docs/source/extend_kedro/common_use_cases.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/extend_kedro/common_use_cases.md#L15

[Kedro.toowordy] 'various different' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'various different' is too wordy", "location": {"path": "docs/source/extend_kedro/common_use_cases.md", "range": {"start": {"line": 15, "column": 82}}}, "severity": "WARNING"}

## Use Case 3: How to add or modify CLI commands

Expand Down
2 changes: 1 addition & 1 deletion docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]

## Kedro documentation
* {doc}`Where can I find the documentation about Kedro-Viz<kedro-viz:kedro-viz_visualisation>`?
* {doc}`Where can I find the documentation for Kedro's datasets<kedro-datasets:kedro_datasets>`?
* {py:mod}`Where can I find the documentation for Kedro's datasets <kedro-datasets:kedro_datasets>`?

## Working with Jupyter

Expand Down
2 changes: 1 addition & 1 deletion docs/source/get_started/kedro_concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@

The Kedro Data Catalog is the registry of all data sources that the project can use to manage loading and saving data. It maps the names of node inputs and outputs as keys in a `DataCatalog`, a Kedro class that can be specialised for different types of data storage.

{doc}`Kedro provides different built-in datasets<kedro-datasets:kedro_datasets>` for numerous file types and file systems, so you don’t have to write the logic for reading/writing data.
{py:mod}`Kedro provides different built-in datasets <kedro-datasets:kedro_datasets>` for numerous file types and file systems, so you don’t have to write the logic for reading/writing data.

Check warning on line 60 in docs/source/get_started/kedro_concepts.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/get_started/kedro_concepts.md#L60

[Kedro.toowordy] 'numerous' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'numerous' is too wordy", "location": {"path": "docs/source/get_started/kedro_concepts.md", "range": {"start": {"line": 60, "column": 90}}}, "severity": "WARNING"}

## Kedro project directory structure

Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorial/set_up_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ When you have finished, close `ipython` session with `exit()`.

### Custom data

{doc}`Kedro supports numerous datasets<kedro-datasets:kedro_datasets>` out of the box, but you can also add support for any proprietary data format or filesystem.
{py:mod}`Kedro supports numerous datasets <kedro-datasets:kedro_datasets>` out of the box, but you can also add support for any proprietary data format or filesystem.

You can find further information about [how to add support for custom datasets](../data/how_to_create_a_custom_dataset.md) in specific documentation covering advanced usage.

Expand Down
7 changes: 2 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,12 @@ test = [
"pytest-mock>=1.7.1, <2.0",
"pytest-xdist[psutil]~=2.2.1",
"pytest~=7.2",
"s3fs>=0.3.0, <0.5", # Needs to be at least 0.3.0 to make use of `cachable` attribute on S3FileSystem.
"s3fs>=2021.4, <2024.1", # Upper bound set arbitrarily, to be reassessed in early 2024
"semver",
"trufflehog~=2.1",
]
docs = [
# docutils>=0.17 changed the HTML
# see https://github.com/readthedocs/sphinx_rtd_theme/issues/1115
"docutils==0.16",
"docutils<0.18",
"sphinx~=5.3.0",
"sphinx_rtd_theme==1.2.0",
# Regression on sphinx-autodoc-typehints 1.21
Expand All @@ -98,7 +96,6 @@ docs = [
"sphinxcontrib-mermaid~=0.7.1",
"myst-parser~=1.0.0",
"Jinja2<3.1.0",
"kedro-datasets[all]~=1.8.0",
]
all = [ "kedro[test,docs]" ]

Expand Down