Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming a table may conflict with the new table with old table name #6890

Closed
coolderli opened this issue Feb 21, 2023 · 7 comments
Closed
Labels

Comments

@coolderli
Copy link
Contributor

Apache Iceberg version

1.0.0

Query engine

Spark

Please describe the bug 🐞

I searched the issues but didn't find the answer, there is my problem.

I query the table with the name app_hr_talent_empl_df_1d_v2_drop but I got File does not exist: /xxxx/xxdb/app_hr_talent_empl_df_1d_v2/metadata/00243-109f2aee-78a2-42b1-bfba-6a69e18b919b.metadata.json

I rename the table app_hr_talent_empl_df_1d_v2 to app_hr_talent_empl_df_1d_v2_drop and then I create a new table with the name app_hr_talent_empl_df_1d_v2.

So I think the orphans clean produces of table app_hr_talent_empl_df_1d_v2delete the file /xxxx/xxdb/app_hr_talent_empl_df_1d_v2/metadata/00243-109f2aee-78a2-42b1-bfba-6a69e18b919b.metadata.json.

How should we deal with this situation, Thanks.

@coolderli coolderli changed the title Renaming a table may conflict with the new table name Renaming a table may conflict with the new table with old table name Feb 21, 2023
@nastra
Copy link
Contributor

nastra commented Feb 21, 2023

@coolderli could you please describe your catalog configuration and which operations you are executing? That may help in pinpointing down the issue. Also logs would be helpful in case you have them.

This might be a shot in the dark, but could you try with cache-enabled = false or run REFRESH.

@coolderli
Copy link
Contributor Author

@nastra Thank you for your attention.
I used the hive catalog. We periodically perform orphan file cleanup for each table.
As described above, I renamed table_a to table_b and recreated table_a. Then table_a and table_b have the same data directory.
The table_a's orphan file cleanup program will clean up table_b's normal files, so when we query table_b encountered FileNotFoundException.

@ajantha-bhat
Copy link
Member

ajantha-bhat commented Feb 21, 2023

True. As both the tables share the same path, running remove orphan files on one table can clean up the live files of another table.

I don't think we have a fix for this based on the current remove orphan files design.

For Nessie Catalog, I am using UUID for the table path during "create table" to avoid this problem. Maybe we need to enforce this to all the catalogs (I can work on this once people agree to this change)

As a workaround, when you create a new table (you can specify the new table location in the create table statement or table properties)

@szehon-ho
Copy link
Collaborator

Yea this is a tough problem Ive thought about before as well, the only fix is to create table with new location as Ajantha mentioned. I feel the only fix would have to maintain your own index of table to location, and then check if a location is taken before allowing to create the table at that location. I'm not sure if that's a feature we can have catalog support at some point. (Though it wont help cross-catalog cases).

@szehon-ho
Copy link
Collaborator

I remember there's some related thoughts on the matter: #4159

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Aug 21, 2023
Copy link

github-actions bot commented Jan 3, 2024

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants