Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement write retry for DuckDB #128

Merged

Conversation

Sevenannn
Copy link
Contributor

This PR marks the DuckDB write failure retriable, thus allowing retries on DuckDB dataset write issues, for example, fail to get a connection due to resource starvation.

2024/10/02 12:37:01 INFO Checking for latest Spice runtime release...
2024/10/02 12:37:03 INFO Spice.ai runtime starting...
2024-10-02T19:37:04.251808Z  INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090
2024-10-02T19:37:04.251552Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-10-02T19:37:04.256872Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2024-10-02T19:37:04.262993Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2024-10-02T19:37:04.453973Z  INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2024-10-02T19:37:05.070588Z  INFO runtime: Dataset call_center registered (s3://spiceai-public-datasets/tpcds/call_center/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.072151Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset call_center
2024-10-02T19:37:05.073839Z  INFO runtime: Dataset store registered (s3://spiceai-public-datasets/tpcds/store/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.074754Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset store
2024-10-02T19:37:05.076654Z  INFO runtime: Dataset web_page registered (s3://spiceai-public-datasets/tpcds/web_page/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.078431Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset web_page
2024-10-02T19:37:05.079612Z  INFO runtime: Dataset household_demographics registered (s3://spiceai-public-datasets/tpcds/household_demographics/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.081401Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset household_demographics
2024-10-02T19:37:05.082397Z  INFO runtime: Dataset warehouse registered (s3://spiceai-public-datasets/tpcds/warehouse/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.083892Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset warehouse
2024-10-02T19:37:05.085132Z  INFO runtime: Dataset web_site registered (s3://spiceai-public-datasets/tpcds/web_site/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.085932Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset web_site
2024-10-02T19:37:05.087634Z  INFO runtime: Dataset catalog_page registered (s3://spiceai-public-datasets/tpcds/catalog_page/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.088732Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset catalog_page
2024-10-02T19:37:05.090285Z  INFO runtime: Dataset reason registered (s3://spiceai-public-datasets/tpcds/reason/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.090417Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset reason
2024-10-02T19:37:05.093874Z  INFO runtime: Dataset income_band registered (s3://spiceai-public-datasets/tpcds/income_band/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.095048Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset income_band
2024-10-02T19:37:05.123784Z  INFO runtime: Dataset promotion registered (s3://spiceai-public-datasets/tpcds/promotion/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.125034Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset promotion
2024-10-02T19:37:05.126291Z  INFO runtime: Dataset ship_mode registered (s3://spiceai-public-datasets/tpcds/ship_mode/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.126868Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset ship_mode
2024-10-02T19:37:05.130470Z  INFO runtime: Dataset time_dim registered (s3://spiceai-public-datasets/tpcds/time_dim/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.131622Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset time_dim
2024-10-02T19:37:05.155516Z  INFO runtime: Dataset customer_address registered (s3://spiceai-public-datasets/tpcds/customer_address/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.156599Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset customer_address
2024-10-02T19:37:05.166887Z  INFO runtime: Dataset customer registered (s3://spiceai-public-datasets/tpcds/customer/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.167605Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset customer
2024-10-02T19:37:05.170444Z  INFO runtime: Dataset item registered (s3://spiceai-public-datasets/tpcds/item/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.171607Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset item
2024-10-02T19:37:05.183176Z  INFO runtime: Dataset web_returns registered (s3://spiceai-public-datasets/tpcds/web_returns/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.184331Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset web_returns
2024-10-02T19:37:05.201976Z  INFO runtime: Dataset web_sales registered (s3://spiceai-public-datasets/tpcds/web_sales/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.203162Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset web_sales
2024-10-02T19:37:05.226343Z  INFO runtime: Dataset store_returns registered (s3://spiceai-public-datasets/tpcds/store_returns/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.226419Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset store_returns
2024-10-02T19:37:05.234555Z  INFO runtime: Dataset customer_demographics registered (s3://spiceai-public-datasets/tpcds/customer_demographics/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.235487Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset customer_demographics
2024-10-02T19:37:05.256129Z  INFO runtime: Dataset date_dim registered (s3://spiceai-public-datasets/tpcds/date_dim/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.257293Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset date_dim
2024-10-02T19:37:05.267866Z  INFO runtime: Dataset catalog_returns registered (s3://spiceai-public-datasets/tpcds/catalog_returns/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.268739Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset catalog_returns
2024-10-02T19:37:05.298004Z  INFO runtime: Dataset inventory registered (s3://spiceai-public-datasets/tpcds/inventory/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.299170Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset inventory
2024-10-02T19:37:05.345373Z  INFO runtime: Dataset catalog_sales registered (s3://spiceai-public-datasets/tpcds/catalog_sales/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.345429Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset catalog_sales
2024-10-02T19:37:05.348345Z  INFO runtime: Dataset store_sales registered (s3://spiceai-public-datasets/tpcds/store_sales/), acceleration (duckdb:file), results cache enabled.
2024-10-02T19:37:05.349492Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset store_sales
2024-10-02T19:37:05.822458Z  INFO runtime::accelerated_table::refresh_task: Loaded 6 rows (6.42 kiB) for dataset call_center in 750ms.
2024-10-02T19:37:05.873949Z  INFO runtime::accelerated_table::refresh_task: Loaded 35 rows (2.13 kiB) for dataset reason in 783ms.
2024-10-02T19:37:05.878050Z  INFO runtime::accelerated_table::refresh_task: Loaded 60 rows (9.07 kiB) for dataset web_page in 799ms.
2024-10-02T19:37:05.879807Z  INFO runtime::accelerated_table::refresh_task: Loaded 30 rows (15.47 kiB) for dataset web_site in 793ms.
2024-10-02T19:37:05.888504Z  INFO runtime::accelerated_table::refresh_task: Loaded 20 rows (528.00 B) for dataset income_band in 793ms.
2024-10-02T19:37:05.890588Z  INFO runtime::accelerated_table::refresh_task: Loaded 5 rows (3.23 kiB) for dataset warehouse in 806ms.
2024-10-02T19:37:05.894075Z  INFO runtime::accelerated_table::refresh_task: Loaded 12 rows (8.54 kiB) for dataset store in 819ms.
2024-10-02T19:37:05.908698Z  INFO runtime::accelerated_table::refresh_task: Loaded 300 rows (72.75 kiB) for dataset promotion in 783ms.
2024-10-02T19:37:05.934016Z  INFO runtime::accelerated_table::refresh_task: Loaded 20 rows (2.55 kiB) for dataset ship_mode in 807ms.
2024-10-02T19:37:05.978951Z  INFO runtime::accelerated_table::refresh_task: Loaded 7,200 rows (208.99 kiB) for dataset household_demographics in 897ms.
2024-10-02T19:37:35.824282Z  WARN runtime::accelerated_table::refresh: Failed to checkpoint dataset call_center: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:35.827877Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset catalog_page: External error: Arrow error: External error: External: Generic S3 error: error decoding response body
2024-10-02T19:37:35.877291Z  WARN runtime::accelerated_table::refresh: Failed to checkpoint dataset reason: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:35.879919Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset time_dim: External error: Arrow error: External error: External: Generic S3 error: error decoding response body
2024-10-02T19:37:35.880068Z  WARN runtime::accelerated_table::refresh: Failed to checkpoint dataset web_site: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:35.880571Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset item: External error: Arrow error: External error: External: Generic S3 error: error decoding response body
2024-10-02T19:37:35.880597Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset customer_address: External error: Arrow error: External error: External: Generic S3 error: error decoding response body
2024-10-02T19:37:35.884128Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset customer: External error: Arrow error: External error: External: Generic S3 error: error decoding response body
2024-10-02T19:37:35.899158Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset web_returns: External error: Arrow error: External error: External: Generic S3 error: error decoding response body
2024-10-02T19:37:36.626870Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset customer_address
2024-10-02T19:37:36.704514Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset date_dim: External error: DbConnectionError: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:36.711855Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset web_returns
2024-10-02T19:37:36.749271Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset customer_demographics: External error: DbConnectionError: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:36.760317Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset customer
2024-10-02T19:37:36.839955Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset catalog_page
2024-10-02T19:37:36.859038Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset item
2024-10-02T19:37:37.027727Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset time_dim
2024-10-02T19:37:37.420875Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset date_dim
2024-10-02T19:37:37.977620Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset customer_demographics
2024-10-02T19:37:38.197631Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset store_returns: External error: DbConnectionError: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:39.079875Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset store_returns
2024-10-02T19:37:41.299766Z  INFO runtime::accelerated_table::refresh_task: Loaded 50,000 rows (10.80 MiB) for dataset customer_address in 4s 672ms.
2024-10-02T19:37:43.270570Z  INFO runtime::accelerated_table::refresh_task: Loaded 71,654 rows (17.39 MiB) for dataset web_returns in 6s 558ms.
2024-10-02T19:37:43.276875Z  INFO runtime::accelerated_table::refresh_task: Loaded 11,718 rows (2.00 MiB) for dataset catalog_page in 6s 436ms.
2024-10-02T19:37:43.306158Z  INFO runtime::accelerated_table::refresh_task: Loaded 18,000 rows (8.66 MiB) for dataset item in 6s 447ms.
2024-10-02T19:37:43.343240Z  INFO runtime::accelerated_table::refresh_task: Loaded 86,400 rows (8.12 MiB) for dataset time_dim in 6s 315ms.
2024-10-02T19:37:43.457865Z  INFO runtime::accelerated_table::refresh_task: Loaded 100,000 rows (20.89 MiB) for dataset customer in 6s 697ms.
2024-10-02T19:37:43.523986Z  INFO runtime::accelerated_table::refresh_task: Loaded 73,049 rows (13.44 MiB) for dataset date_dim in 6s 103ms.
2024-10-02T19:37:45.922513Z  INFO runtime::accelerated_table::refresh_task: Loaded 1,920,800 rows (141.10 MiB) for dataset customer_demographics in 7s 944ms.
2024-10-02T19:37:47.676402Z  WARN runtime::accelerated_table::refresh_task: Failed to update dataset web_sales: External error: DbConnectionError: ConnectionPoolError: timed out waiting for connection
2024-10-02T19:37:48.840244Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset web_sales
2024-10-02T19:37:50.017185Z  INFO runtime::accelerated_table::refresh_task: Loaded 144,067 rows (37.51 MiB) for dataset catalog_returns in 44s 748ms.
2024-10-02T19:37:53.851937Z  INFO runtime::accelerated_table::refresh_task: Loaded 11,745,000 rows (221.47 MiB) for dataset inventory in 48s 552ms.
2024-10-02T19:37:54.907731Z  INFO runtime::accelerated_table::refresh_task: Loaded 287,867 rows (61.48 MiB) for dataset store_returns in 15s 827ms.
2024-10-02T19:37:57.997236Z  INFO runtime::accelerated_table::refresh_task: Loaded 719,384 rows (260.59 MiB) for dataset web_sales in 9s 157ms.
2024-10-02T19:38:01.048740Z  INFO runtime::accelerated_table::refresh_task: Loaded 1,441,548 rows (532.67 MiB) for dataset catalog_sales in 55s 703ms.

2024-10-02T19:38:03.317509Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,880,404 rows (746.24 MiB) for dataset store_sales in 57s 968ms.

@sgrebnov sgrebnov merged commit 826814a into datafusion-contrib:spiceai Oct 2, 2024
3 checks passed
hozan23 pushed a commit to hozan23/datafusion-table-providers that referenced this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants