Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable dataset existence checks for unavailable sources #1645

Closed
PabloDeAlbu opened this issue Nov 19, 2023 · 1 comment · Fixed by #1659
Closed

Disable dataset existence checks for unavailable sources #1645

PabloDeAlbu opened this issue Nov 19, 2023 · 1 comment · Fixed by #1659
Assignees
Labels
Issue: Bug Report Python Pull requests that update Python code

Comments

@PabloDeAlbu
Copy link

Description

I'm having issues using version 6.7.0. I have a source defined in my catalog that I access through a VPN. I only use this source for data extraction, so I don't always have the VPN enabled.

When I try to start the Kedro Viz server without the VPN enabled, the following exception is raised:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/kedro/.venv/lib/python3.10/site-packages/kedro_viz/server.py", line 100, in run_server
    populate_data(
  File "/kedro/.venv/lib/python3.10/site-packages/kedro_viz/server.py", line 40, in populate_data
    data_access_manager.resolve_dataset_factory_patterns(catalog, pipelines)
  File "/kedro/.venv/lib/python3.10/site-packages/kedro_viz/data_access/managers.py", line 85, in resolve_dataset_factory_patterns
    catalog.exists(dataset_name)
  File "/kedro/.venv/lib/python3.10/site-packages/kedro/io/data_catalog.py", line 404, in exists
    return dataset.exists()
  File "/kedro/.venv/lib/python3.10/site-packages/kedro/io/core.py", line 313, in exists
    raise DatasetError(message) from exc
kedro.io.core.DatasetError: Failed during exists check for data set SQLTableDataSet(load_args={}, save_args={'index': False}, table_name=community2collection).
(psycopg2.OperationalError) connection to server at "postgres" (xxx.xxx.xxx.xxx), port xxxx failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?

Context

I understand that the error is due to the feature introduced in this #1472 . More precisely, the error occurs here

# Creates data repositories which are used by Kedro Viz Backend APIs
populate_data(
data_access_manager, catalog, pipelines, session_store, stats_dict
)

It seems that the issue stems from Kedro attempting to check the existence of all data sources to display statistics. However, due to the lack of access, not only does it prevent viewing statistics, but it also hinders the ability to use the tool for visualizing pipelines. This occurs regardless of whether there is access to the data sources or not, and it was not an issue in versions prior to 6.7.0

Is there a way to disable this feature? I have no issues with version 6.6.1.

Steps to Reproduce

  1. Define a source of type pandas.SQLTableDataSet to which you do not have access (at least for testing purposes).
  2. Start Kedro Viz
@rashidakanchwala
Copy link
Contributor

Thank you for raising this. We will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report Python Pull requests that update Python code
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants