Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl Table ACLs in all databases #122

Merged
merged 33 commits into from
Sep 5, 2023
Merged

Crawl Table ACLs in all databases #122

merged 33 commits into from
Sep 5, 2023

Conversation

saraivdbx
Copy link
Contributor

@saraivdbx saraivdbx commented Aug 29, 2023

This PR enumerates all databases for GrantsCrawler.

bin/install.py Outdated
logger.info("Uploading...")
ws.workspace.mkdirs(folder_base)
with open(local_wheel_file, "rb") as fh:
ws.workspace.upload(path=remote_wheel_file, content=fh.read(), format=ImportFormat.AUTO)
buf = BytesIO(INSTALL_NOTEBOOK.format(remote_wheel_file=remote_wheel_file).encode())
ws.workspace.upload(path=remote_notebook_file, content=buf)
ws.workspace.upload(path=remote_configuration_file, content=buf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content should be different :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this on Friday

pyproject.toml Outdated Show resolved Hide resolved
Comment on lines 17 to 29
# trigger one-time-run job and get waiter object
waiter = w.jobs.submit(
run_name=f"sdk-{time.time_ns()}",
tasks=[
j.SubmitTask(
existing_cluster_id=cluster_id,
python_wheel_task=j.PythonWheelTask(
"crawler", package_name="databricks-labs-ucx", parameters=[inventory_catalog, inventory_schema]
),
task_key=f"sdk-{time.time_ns()}",
)
],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# trigger one-time-run job and get waiter object
waiter = w.jobs.submit(
run_name=f"sdk-{time.time_ns()}",
tasks=[
j.SubmitTask(
existing_cluster_id=cluster_id,
python_wheel_task=j.PythonWheelTask(
"crawler", package_name="databricks-labs-ucx", parameters=[inventory_catalog, inventory_schema]
),
task_key=f"sdk-{time.time_ns()}",
)
],
)
from databricks.sdk.service import jobs, compute
created_job = ws.jobs.create(tasks=[
jobs.Task(task_key='crawl',
python_wheel_task=jobs.PythonWheelTask(
package_name='databricks.labs.ucx.toolkits.table_acls',
entry_point='the_function_you_are_adding'
),
new_cluster=jobs.ClusterSpec(
libraries=[compute.Library(
whl='... the location of the wheel ..'
)],
new_cluster=compute.ClusterSpec(
node_type_id=ws.clusters.select_node_type(min_cores=10),
spark_version=ws.clusters.select_spark_version(latest=True),
num_workers=1,
# .. other properties if needed
)
))])

i didn't tell to do one time run of the job, i've requested to create a persistent job. have some function somewhere to setup this job, like in the init script, and the use ws.jobs.run_now(created_job.job_id).result() in tests

ws.jobs.run_now(created_job.job_id).result()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR databricks/databricks-sdk-py#264 will prevent this class of errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested code throws the following error:

Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size

I believe this is caused by that the member variable new_cluster of dataclass Task in SDK is type compute.Clusterspec
Shouldn't it be Optional['ClusterSpec'] instead?

Comment on lines 97 to 98
commands = CommandExecutor(ws)
commands.install_notebook_library(f"/Workspace{wsfs_wheel}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this, as the wheel has to be installed on the job cluster, not the interactive cluster. refactor this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will complete this on Friday

inventory_schema = make_schema(catalog=make_catalog())
inventory_catalog, inventory_schema = inventory_schema.split(".")

crawl_tacl(cluster_id, ws, inventory_catalog, inventory_schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really think we need this function. ws.jobs.run_now(created_job.job_id).result() does the job just fine.

@saraivdbx saraivdbx added the pr/do-not-merge this pull request is not ready to merge label Sep 3, 2023
@saraivdbx
Copy link
Contributor Author

Missing:

  • config.py, failling unit test
  • database iteration
  • job creation integration test validations
  • add create_job to install.py

@renardeinside
Copy link
Contributor

renardeinside commented Sep 3, 2023

@nfx how is this PR related to v0.0.3 milestone? we don't need any scheduled crawler jobs to deliver that functionality, we just need table ACLs collection in parallel once during the migration. Could you please explain why scheduled crawling is required?

@nfx
Copy link
Collaborator

nfx commented Sep 4, 2023

@renardeinside it's better to kick off crawling from a job, as it's the only way to deterministically specify correct cluster settings, which is not guaranteed in the notebook.

@renardeinside
Copy link
Contributor

@renardeinside it's better to kick off crawling from a job, as it's the only way to deterministically specify correct cluster settings, which is not guaranteed in the notebook.

We just need to specify the requirements to the cluster that is used by the notebook, nothing more than than. This adds extra complexity, requires the user to configure an additional cluster and so-on. This is a nice feature, but definitely not something on the critical path.

@nfx
Copy link
Collaborator

nfx commented Sep 4, 2023

@renardeinside this has to become self-service with minimum setup steps. this is the first step towards it :)

@renardeinside
Copy link
Contributor

Current PR state from the perspective of milestone v0.0.3:

Functional:

  • can only support one database

Tests:

  • there are some tests, but coveral is not full / complete

Non-functional:

  • scheduled crawler job

Crawler job is not a priority for the v0.0.3 milestone. it's not mentioned in any requirements, and it's just a nice to have from this perspective.
We should split this PR into two. The current PR should be fully oriented towards delivering the functional requirement for v0.0.3 milestone. Everything else is optional and should be moved out of this PR. We have a very strict timeline.

@renardeinside renardeinside changed the title Schedule crawler job ✨ full functional support for Table ACLs in all databses Sep 4, 2023
@renardeinside renardeinside added this to the v0.0.3 milestone Sep 4, 2023
@renardeinside
Copy link
Contributor

@saraivdbx please factor out of this PR any work that is related to scheduling / setting up the crawler job. In milestone v0.0.3 this code will only run from the notebook anyways.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the database: str -> databases: list signature changes from TableCrawler and GrantsGrawler, as it doesn't fit and introduces more risk. Instead, iterate over databases and call methods that were there:

for db in databases:
    for grant in grants_cralwer.snapshot('hive_metastore', db):
        yield grant

@@ -118,24 +118,20 @@ def __init__(self, tc: TablesCrawler):
super().__init__(tc._backend, tc._catalog, tc._schema, "grants")
self._tc = tc

def snapshot(self, catalog: str, database: str) -> list[Grant]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changeset looks larger than necessary. why justifies changing the signatures of 5 downstream methods versus iterating over databases in a loop?

doesn't this code look more maintainable in comparison?

for db in databases:
    for grant in grants_cralwer.snapshot('hive_metastore', db):
        yield grant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up implementing this approach because I attempted the same versions - yield and return - just like what you suggested - and both resulted in only taking a snapshot of the first database in the iteration.

def grants_snapshot(self, schema):
return self._gc.snapshot("hive_metastore", schema)
def grants_snapshot(self):
return self._gc.snapshot("hive_metastore", self.databases)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change the signatures of TablesCrawler and GrantsCrawler by changing a lot of already tested code?

Why didn't we use this approach?

grants = []
for db in self.databases:
    for grant in grants_cralwer.snapshot('hive_metastore', db):
        grants.append(grant)
return grants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

src/databricks/labs/ucx/toolkits/table_acls.py Outdated Show resolved Hide resolved
tests/integration/test_tacls.py Outdated Show resolved Hide resolved
@nfx
Copy link
Collaborator

nfx commented Sep 4, 2023

+resolve conflicts

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still not comfortable with a signature change, that also introduces hidden bugs.

I think that adding a small state machine into CrawlerBase would avoid changes in the TablesCrawler, GrantsCrawler, and other future subclasses. Please try.

Index: src/databricks/labs/ucx/tacl/_internal.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/databricks/labs/ucx/tacl/_internal.py b/src/databricks/labs/ucx/tacl/_internal.py
--- a/src/databricks/labs/ucx/tacl/_internal.py	(revision a11ae80d3c7a2384dca79c70c00c3eada4a9d91f)
+++ b/src/databricks/labs/ucx/tacl/_internal.py	(date 1693919765155)
@@ -143,15 +143,25 @@
         Returns:
         list[any]: A list of data records, either fetched or loaded.
         """
+        loaded = False
+        trigger_load = ValueError('trigger records load')
         while True:
             try:
                 logger.debug(f"[{self._full_name}] fetching {self._table} inventory")
-                return list(fetcher())
+                cached_results = list(fetcher())
+                if len(cached_results) == 0 and loaded:
+                    return cached_results
+                if len(cached_results) == 0 and not loaded:
+                    raise trigger_load
+                return cached_results
             except Exception as e:
-                if "TABLE_OR_VIEW_NOT_FOUND" not in str(e):
+                if not (e == trigger_load or "TABLE_OR_VIEW_NOT_FOUND" in str(e)):
                     raise e
-                logger.debug(f"[{self._full_name}] {self._table} inventory not found, crawling")
-                self._append_records(klass, loader())
+            logger.debug(f"[{self._full_name}] crawling new batch for {self._table}")
+            loaded_records = list(loader())
+            if len(loaded_records) > 0:
+                self._append_records(klass, loaded_records)
+            loaded = True
 
     @staticmethod
     def _row_to_sql(row, fields):
Index: tests/integration/test_tacls.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tests/integration/test_tacls.py b/tests/integration/test_tacls.py
--- a/tests/integration/test_tacls.py	(revision a11ae80d3c7a2384dca79c70c00c3eada4a9d91f)
+++ b/tests/integration/test_tacls.py	(date 1693918922272)
@@ -15,9 +15,13 @@
     schema = make_schema(catalog="hive_metastore")
     managed_table = make_table(schema=schema)
     external_table = make_table(schema=schema, external=True)
-    tmp_table = make_table(schema=schema, ctas="SELECT 2+2 AS four")
     view = make_table(schema=schema, ctas="SELECT 2+2 AS four", view=True)
-    non_delta = make_table(schema=schema, non_detla=True)
+
+    schema2 = make_schema(catalog="hive_metastore")
+    tmp_table = make_table(schema=schema2, ctas="SELECT 2+2 AS four")
+    non_delta = make_table(schema=schema2, non_detla=True)
+
+    schema3 = make_schema(catalog="hive_metastore")
 
     logger.info(
         f"managed_table={managed_table}, "
@@ -33,6 +37,10 @@
     all_tables = {}
     for t in tak.database_snapshot(schema.split(".")[1]):
         all_tables[t.key] = t
+    for t in tak.database_snapshot(schema2.split(".")[1]):
+        all_tables[t.key] = t
+    for t in tak.database_snapshot(schema3.split(".")[1]):
+        all_tables[t.key] = t # no tables in schema3
 
     assert len(all_tables) == 5
     assert all_tables[non_delta].table_format == "JSON"

src/databricks/labs/ucx/tacl/grants.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/tacl/grants.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/tacl/grants.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/tacl/tables.py Outdated Show resolved Hide resolved
@nfx nfx changed the title ✨ full functional support for Table ACLs in all databases Crawl Table ACLs in all databases Sep 5, 2023
@saraivdbx
Copy link
Contributor Author

Applied suggested changes

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still needs few changes

tests/integration/test_tacls.py Show resolved Hide resolved
src/databricks/labs/ucx/toolkits/table_acls.py Outdated Show resolved Hide resolved
tests/integration/test_tacls.py Show resolved Hide resolved
tests/integration/test_tacls.py Outdated Show resolved Hide resolved
@nfx nfx removed the pr/do-not-merge this pull request is not ready to merge label Sep 5, 2023
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comment

log_level="DEBUG",
)
toolkit = GroupMigrationToolkit(config)
tacltoolkit = TaclToolkit(toolkit._ws, config.inventory.table.catalog, config.tacl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tacltoolkit = TaclToolkit(toolkit._ws, config.inventory.table.catalog, config.tacl)
tacltoolkit = TaclToolkit(toolkit._ws,
inventory_catalog=config.inventory.table.catalog,
inventory_schema=config.inventory.table.schema,
databases=["default"],
auto=False,
warehouse_id=None)

this is missing inventory schema.


class TaclToolkit:
def __init__(self, ws: WorkspaceClient, inventory_catalog, inventory_schema, warehouse_id=None):
def __init__(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comment: don't depend on TaclConfig, but rather add explicit constructor arguments: databases=["default"], auto=False. this will make it easier to extend and test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added only databases, as during configuration, we enforce setting either databases or auto

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx added this pull request to the merge queue Sep 5, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 5, 2023
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase

tests/unit/test_grants.py Outdated Show resolved Hide resolved
@nfx nfx enabled auto-merge September 5, 2023 19:17
@nfx nfx added this pull request to the merge queue Sep 5, 2023
Merged via the queue into main with commit 26c7246 Sep 5, 2023
1 check passed
@nfx nfx deleted the schedule-crawler-job branch September 5, 2023 19:23
@nfx nfx mentioned this pull request Sep 18, 2023
nfx added a commit that referenced this pull request Sep 18, 2023
# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).
FastLee pushed a commit that referenced this pull request Sep 19, 2023
# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants