From dc6d2429aafbffc626cba53aaac3f6198fc37eb3 Mon Sep 17 00:00:00 2001 From: Kevin Liu Date: Sun, 1 Sep 2024 07:32:08 -0700 Subject: [PATCH] Use `markdownlint` instead of `mdformat` (#1118) * use markdownlint * add .markdownlint.yaml * fix make lint * use 4 space indent * fix --- .markdownlint.yaml | 26 ++++++++++++++++++++++++ .pre-commit-config.yaml | 14 ++++--------- mkdocs/docs/SUMMARY.md | 1 + mkdocs/docs/api.md | 37 +++++++++++++++++------------------ mkdocs/docs/configuration.md | 20 ++++++++++--------- mkdocs/docs/contributing.md | 10 +++++----- mkdocs/docs/how-to-release.md | 8 ++++---- mkdocs/docs/index.md | 2 +- mkdocs/docs/verify-release.md | 10 ++++++---- 9 files changed, 76 insertions(+), 52 deletions(-) create mode 100644 .markdownlint.yaml diff --git a/.markdownlint.yaml b/.markdownlint.yaml new file mode 100644 index 0000000000..42f210c6e4 --- /dev/null +++ b/.markdownlint.yaml @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Default state for all rules +default: true + +# MD013/line-length - Line length +MD013: false + +# MD007/ul-indent - Unordered list indentation +MD007: + indent: 4 diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4531e45da6..10540a6b52 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -46,17 +46,11 @@ repos: hooks: - id: pycln args: [--config=pyproject.toml] - - repo: https://github.com/executablebooks/mdformat - rev: 0.7.17 + - repo: https://github.com/igorshubovych/markdownlint-cli + rev: v0.41.0 hooks: - - id: mdformat - additional_dependencies: - - mdformat-black==0.1.1 - - mdformat-config==0.1.3 - - mdformat-beautysh==0.1.1 - - mdformat-admon==1.0.1 - - mdformat-mkdocs==1.0.1 - - mdformat-frontmatter==2.0.1 + - id: markdownlint + args: ["--fix"] - repo: https://github.com/pycqa/pydocstyle rev: 6.3.0 hooks: diff --git a/mkdocs/docs/SUMMARY.md b/mkdocs/docs/SUMMARY.md index 5cf753d4c3..15f74931ce 100644 --- a/mkdocs/docs/SUMMARY.md +++ b/mkdocs/docs/SUMMARY.md @@ -18,6 +18,7 @@ +# Summary - [Getting started](index.md) - [Configuration](configuration.md) diff --git a/mkdocs/docs/api.md b/mkdocs/docs/api.md index 595b2ae5da..53a7846be3 100644 --- a/mkdocs/docs/api.md +++ b/mkdocs/docs/api.md @@ -280,7 +280,7 @@ tbl.overwrite(df) The data is written to the table, and when the table is read using `tbl.scan().to_arrow()`: -``` +```python pyarrow.Table city: string lat: double @@ -303,7 +303,7 @@ tbl.append(df) When reading the table `tbl.scan().to_arrow()` you can see that `Groningen` is now also part of the table: -``` +```python pyarrow.Table city: string lat: double @@ -342,7 +342,7 @@ tbl.delete(delete_filter="city == 'Paris'") In the above example, any records where the city field value equals to `Paris` will be deleted. Running `tbl.scan().to_arrow()` will now yield: -``` +```python pyarrow.Table city: string lat: double @@ -362,7 +362,6 @@ To explore the table metadata, tables can be inspected. !!! tip "Time Travel" To inspect a tables's metadata with the time travel feature, call the inspect table method with the `snapshot_id` argument. Time travel is supported on all metadata tables except `snapshots` and `refs`. - ```python table.inspect.entries(snapshot_id=805611270568163028) ``` @@ -377,7 +376,7 @@ Inspect the snapshots of the table: table.inspect.snapshots() ``` -``` +```python pyarrow.Table committed_at: timestamp[ms] not null snapshot_id: int64 not null @@ -405,7 +404,7 @@ Inspect the partitions of the table: table.inspect.partitions() ``` -``` +```python pyarrow.Table partition: struct not null child 0, dt_month: int32 @@ -446,7 +445,7 @@ To show all the table's current manifest entries for both data and delete files. table.inspect.entries() ``` -``` +```python pyarrow.Table status: int8 not null snapshot_id: int64 not null @@ -604,7 +603,7 @@ To show a table's known snapshot references: table.inspect.refs() ``` -``` +```python pyarrow.Table name: string not null type: string not null @@ -629,7 +628,7 @@ To show a table's current file manifests: table.inspect.manifests() ``` -``` +```python pyarrow.Table content: int8 not null path: string not null @@ -679,7 +678,7 @@ To show table metadata log entries: table.inspect.metadata_log_entries() ``` -``` +```python pyarrow.Table timestamp: timestamp[ms] not null file: string not null @@ -702,7 +701,7 @@ To show a table's history: table.inspect.history() ``` -``` +```python pyarrow.Table made_current_at: timestamp[ms] not null snapshot_id: int64 not null @@ -723,7 +722,7 @@ Inspect the data files in the current snapshot of the table: table.inspect.files() ``` -``` +```python pyarrow.Table content: int8 not null file_path: string not null @@ -850,7 +849,7 @@ readable_metrics: [ Expert Iceberg users may choose to commit existing parquet files to the Iceberg table as data files, without rewriting them. -``` +```python # Given that these parquet files have schema consistent with the Iceberg table file_paths = [ @@ -930,7 +929,7 @@ with table.update_schema() as update: Now the table has the union of the two schemas `print(table.schema())`: -``` +```python table { 1: city: optional string 2: lat: optional double @@ -1180,7 +1179,7 @@ table.scan( This will return a PyArrow table: -``` +```python pyarrow.Table VendorID: int64 tpep_pickup_datetime: timestamp[us, tz=+00:00] @@ -1222,7 +1221,7 @@ table.scan( This will return a Pandas dataframe: -``` +```python VendorID tpep_pickup_datetime tpep_dropoff_datetime 0 2 2021-04-01 00:28:05+00:00 2021-04-01 00:47:59+00:00 1 1 2021-04-01 00:39:01+00:00 2021-04-01 00:57:39+00:00 @@ -1295,7 +1294,7 @@ ray_dataset = table.scan( This will return a Ray dataset: -``` +```python Dataset( num_blocks=1, num_rows=1168798, @@ -1346,7 +1345,7 @@ df = df.select("VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime") This returns a Daft Dataframe which is lazily materialized. Printing `df` will display the schema: -``` +```python ╭──────────┬───────────────────────────────┬───────────────────────────────╮ │ VendorID ┆ tpep_pickup_datetime ┆ tpep_dropoff_datetime │ │ --- ┆ --- ┆ --- │ @@ -1364,7 +1363,7 @@ This is correctly optimized to take advantage of Iceberg features such as hidden df.show(2) ``` -``` +```python ╭──────────┬───────────────────────────────┬───────────────────────────────╮ │ VendorID ┆ tpep_pickup_datetime ┆ tpep_dropoff_datetime │ │ --- ┆ --- ┆ --- │ diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md index dc67b79044..422800675f 100644 --- a/mkdocs/docs/configuration.md +++ b/mkdocs/docs/configuration.md @@ -22,6 +22,8 @@ hide: - under the License. --> +# Configuration + ## Tables Iceberg tables support table properties to configure table behavior. @@ -77,15 +79,15 @@ For the FileIO there are several configuration options available: | Key | Example | Description | | -------------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| s3.endpoint | https://10.0.19.25/ | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. | +| s3.endpoint | | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. | | s3.access-key-id | admin | Configure the static access key id used to access the FileIO. | | s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. | | s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. | | s3.signer | bearer | Configure the signature version of the FileIO. | -| s3.signer.uri | http://my.signer:8080/s3 | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `/`. | +| s3.signer.uri | | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `/`. | | s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `/`. (default : v1/aws/s3/sign). | | s3.region | us-west-2 | Sets the region of the bucket | -| s3.proxy-uri | http://my.proxy.com:8080 | Configure the proxy server to be used by the FileIO. | +| s3.proxy-uri | | Configure the proxy server to be used by the FileIO. | | s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. | @@ -96,7 +98,7 @@ For the FileIO there are several configuration options available: | Key | Example | Description | | -------------------- | ------------------- | ------------------------------------------------ | -| hdfs.host | https://10.0.19.25/ | Configure the HDFS host to connect to | +| hdfs.host | | Configure the HDFS host to connect to | | hdfs.port | 9000 | Configure the HDFS port to connect to. | | hdfs.user | user | Configure the HDFS username used for connection. | | hdfs.kerberos_ticket | kerberos_ticket | Configure the path to the Kerberos ticket cache. | @@ -109,7 +111,7 @@ For the FileIO there are several configuration options available: | Key | Example | Description | | ----------------------- | ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| adlfs.connection-string | AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint=http://localhost/ | A [connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like [azurite](https://github.com/azure/azurite)). | +| adlfs.connection-string | AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint= | A [connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like [azurite](https://github.com/azure/azurite)). | | adlfs.account-name | devstoreaccount1 | The account that you want to connect to | | adlfs.account-key | Eby8vdM02xNOcqF... | The key to authentication against the account. | | adlfs.sas-token | NuHOuuzdQN7VRM%2FOpOeqBlawRCA845IY05h9eu1Yte4%3D | The shared access signature | @@ -133,7 +135,7 @@ For the FileIO there are several configuration options available: | gcs.cache-timeout | 60 | Configure the cache expiration time in seconds for object metadata cache | | gcs.requester-pays | False | Configure whether to use requester-pays requests | | gcs.session-kwargs | {} | Configure a dict of parameters to pass on to aiohttp.ClientSession; can contain, for example, proxy settings. | -| gcs.endpoint | http://0.0.0.0:4443 | Configure an alternative endpoint for the GCS FileIO to access (format protocol://host:port) If not given, defaults to the value of environment variable "STORAGE_EMULATOR_HOST"; if that is not set either, will use the standard Google endpoint. | +| gcs.endpoint | | Configure an alternative endpoint for the GCS FileIO to access (format protocol://host:port) If not given, defaults to the value of environment variable "STORAGE_EMULATOR_HOST"; if that is not set either, will use the standard Google endpoint. | | gcs.default-location | US | Configure the default location where buckets are created, like 'US' or 'EUROPE-WEST3'. | | gcs.version-aware | False | Configure whether to support object versioning on the GCS bucket. | @@ -200,7 +202,7 @@ catalog: | Key | Example | Description | | ------------------- | -------------------------------- | -------------------------------------------------------------------------------------------------- | -| uri | https://rest-catalog/ws | URI identifying the REST Server | +| uri | | URI identifying the REST Server | | ugi | t-1234:secret | Hadoop UGI for Hive client. | | credential | t-1234:secret | Credential to use for OAuth2 credential flow when initializing the catalog | | token | FEW23.DFSDF.FSDF | Bearer token value to use for `Authorization` header | @@ -210,7 +212,7 @@ catalog: | rest.sigv4-enabled | true | Sign requests to the REST Server using AWS SigV4 protocol | | rest.signing-region | us-east-1 | The region to use when SigV4 signing a request | | rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request | -| oauth2-server-uri | https://auth-service/cc | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') | +| oauth2-server-uri | | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') | @@ -325,7 +327,7 @@ catalog: | ---------------------- | ------------------------------------ | ------------------------------------------------------------------------------- | | glue.id | 111111111111 | Configure the 12-digit ID of the Glue Catalog | | glue.skip-archive | true | Configure whether to skip the archival of older table versions. Default to true | -| glue.endpoint | https://glue.us-east-1.amazonaws.com | Configure an alternative endpoint of the Glue service for GlueCatalog to access | +| glue.endpoint | | Configure an alternative endpoint of the Glue service for GlueCatalog to access | | glue.profile-name | default | Configure the static profile used to access the Glue Catalog | | glue.region | us-east-1 | Set the region of the Glue Catalog | | glue.access-key-id | admin | Configure the static access key id used to access the Glue Catalog | diff --git a/mkdocs/docs/contributing.md b/mkdocs/docs/contributing.md index a716f13479..d87f2ec6a3 100644 --- a/mkdocs/docs/contributing.md +++ b/mkdocs/docs/contributing.md @@ -70,7 +70,7 @@ pip3 install -e ".[s3fs,hive]" Install it directly for GitHub (not recommended), but sometimes handy: -``` +```shell pip install "git+https://github.com/apache/iceberg-python.git#egg=pyiceberg[pyarrow]" ``` @@ -121,13 +121,13 @@ make test-adlfs To pass additional arguments to pytest, you can use `PYTEST_ARGS`. -_Run pytest in verbose mode_ +### Run pytest in verbose mode ```sh make test PYTEST_ARGS="-v" ``` -_Run pytest with pdb enabled_ +### Run pytest with pdb enabled ```sh make test PYTEST_ARGS="--pdb" @@ -176,7 +176,7 @@ def load_something(): Which will warn: -``` +```text Call to load_something, deprecated in 0.1.0, will be removed in 0.2.0. Please use load_something_else() instead. ``` @@ -194,7 +194,7 @@ deprecation_message( Which will warn: -``` +```text Deprecated in 0.1.0, will be removed in 0.2.0. The old_property is deprecated. Please use the something_else property instead. ``` diff --git a/mkdocs/docs/how-to-release.md b/mkdocs/docs/how-to-release.md index 9534b1fe3b..826d01df44 100644 --- a/mkdocs/docs/how-to-release.md +++ b/mkdocs/docs/how-to-release.md @@ -81,7 +81,7 @@ export GIT_TAG_HASH=${GIT_TAG_REF:0:40} export LAST_COMMIT_ID=$(git rev-list ${GIT_TAG} 2> /dev/null | head -n 1) ``` -The `-s` option will sign the commit. If you don't have a key yet, you can find the instructions [here](http://www.apache.org/dev/openpgp.html#key-gen-generate-key). To install gpg on a M1 based Mac, a couple of additional steps are required: https://gist.github.com/phortuin/cf24b1cca3258720c71ad42977e1ba57. +The `-s` option will sign the commit. If you don't have a key yet, you can find the instructions [here](http://www.apache.org/dev/openpgp.html#key-gen-generate-key). To install gpg on a M1 based Mac, a couple of additional steps are required: . If you have not published your GPG key in [KEYS](https://dist.apache.org/repos/dist/dev/iceberg/KEYS) yet, you must publish it before sending the vote email by doing: ```bash @@ -193,7 +193,7 @@ cat release-announcement-email.txt Once the vote has been passed, you can close the vote thread by concluding it: -``` +```text Thanks everyone for voting! The 72 hours have passed, and a minimum of 3 binding votes have been cast: +1 Foo Bar (non-binding) @@ -207,7 +207,7 @@ Kind regards, ### Copy the artifacts to the release dist -``` +```bash export RC=rc2 export VERSION=0.7.0${RC} export VERSION_WITHOUT_RC=${VERSION/rc?/} @@ -237,7 +237,7 @@ twine upload pyiceberg-*.whl pyiceberg-*.tar.gz Send out an announcement on the dev mail list: -``` +```text To: dev@iceberg.apache.org Subject: [ANNOUNCE] Apache PyIceberg release diff --git a/mkdocs/docs/index.md b/mkdocs/docs/index.md index 1fee9cc69b..66b86a9b62 100644 --- a/mkdocs/docs/index.md +++ b/mkdocs/docs/index.md @@ -148,7 +148,7 @@ print(table.scan().to_arrow()) And the new column is there: -``` +```python taxi_dataset( 1: VendorID: optional long, 2: tpep_pickup_datetime: optional timestamp, diff --git a/mkdocs/docs/verify-release.md b/mkdocs/docs/verify-release.md index aff463c58d..4d10c9500d 100644 --- a/mkdocs/docs/verify-release.md +++ b/mkdocs/docs/verify-release.md @@ -84,7 +84,7 @@ cd pyiceberg-${PYICEBERG_RELEASE_VERSION} Run RAT checks to validate license header: -``` +```shell ./dev/check-license ``` @@ -117,8 +117,10 @@ This will spin up Docker containers to faciliate running test coverage. Votes are cast by replying to the release candidate announcement email on the dev mailing list with either `+1`, `0`, or `-1`. For example : -> \[ \] +1 Release this as PyIceberg 0.3.0
-> \[ \] +0
-> \[ \] -1 Do not release this because…
+> [ ] +1 Release this as PyIceberg 0.3.0 +> +> [ ] +0 +> +> [ ] -1 Do not release this because… In addition to your vote, it’s customary to specify if your vote is binding or non-binding. Only members of the Project Management Committee have formally binding votes. If you’re unsure, you can specify that your vote is non-binding. To read more about voting in the Apache framework, checkout the [Voting](https://www.apache.org/foundation/voting.html) information page on the Apache foundation’s website.