Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with "update" and "ignore" in Pangres #28

Closed
BrandonRisley opened this issue Jan 25, 2021 · 4 comments
Closed

Issue with "update" and "ignore" in Pangres #28

BrandonRisley opened this issue Jan 25, 2021 · 4 comments

Comments

@BrandonRisley
Copy link

Hello! First off, thank you so much for the pangres package!

In the past two weeks, the "update" and "ignore" options for the "if_row_exists" parameter seemed to work as intended. However, I recently tried to use both of these features and did not experience the same results as before. Instead, it appears that the function is just appending data to a MYSQL table instead of checking for the unique primary key before inserting/updating/ignoring. I double checked my primary keys and from what I can tell, all of the primary keys match exactly.

Is this an issue for anyone else?

Code Snippet:

df.drop_duplicates(subset=['item_id'], inplace=True)
df.set_index('item_id', inplace=True)

table_name = 'table'

dtype = {'item_id':VARCHAR(length=200)}

pangres.upsert(engine=engine,
df=df,
table_name=table_name,
if_row_exists='ignore', # update or ignore
dtype = dtype)

Some details:
pangres: 2.2.3
Python : 3.8
MYSQL: 8.0
System: Mac OS
SQLAlchemy : 1.3.18
PyMySQL: 0.10.1

@BrandonRisley
Copy link
Author

Update:

It appears that the issue is not with Pangres itself but when used in conjunction with Pandas "to_sql". For my purposes, I used one or the other to update a table. When I wanted to update or ignore rows, I would use Pangres, but when I wanted to restructure the entire table, I would use "to_sql" with "replace" as one of the parameters. On a new MYSQL table where "to_sql" has never been used, Pangres works exactly as it should. For some reason I'm unaware of, if a table has used "to_sql" at some point in the table construction/manipulation process, Pangres cannot recognize the primary keys even though the string is exactly the same. Is there a reason for this? Moving forward, I plan to just use Pangres for my needs, but it would be interesting to know why this issue occurs.

@ThibTrip
Copy link
Owner

Hello @BrandonRisley ! I can't investigate it right now. Have a big project coming up on Wednesday evening.

I can already tell you that using to_sql with replace will drop and then recreate the table without the primary key. If pangres still works after that it could be a bug.

Indeed pangres uses SQL statements such as INSERT ... ON CONFLICT (column(s)) DO NOTHING where column(s) must be a primary key or a unique key. I don't think I made tests to see if such SQL statements will work if column(s) is not a primary/unique key.

If you want to use to_sql in conjunction with pangres you could follow the to_sql statement with an ALTER statement to add a primary key. Or you could drop the table and let pangres recreate it for you (it would be the same workflow as in pandas: drop->create). I suppose I can post an example later 🤔, I am still working now.

@BrandonRisley
Copy link
Author

@ThibTrip No problem! Thank you for the info and your help! I plan on dropping a table manually instead of using "to_sql" , so this will not be an issue in the future.

@ThibTrip
Copy link
Owner

ThibTrip commented Feb 7, 2021

Hello again @BrandonRisley ! Sorry I was very busy and forgot your issue 😿 !

After rereading what you wrote it seems obvious to me now. The problem is not linked to pangres and I'd say it's not really linked to df.to_sql either.

Pangres cannot recognize the primary keys even though the string is exactly the same

Yep it's normal because you lost columns specifications when using df.to_sql(if_exists='replace') which drops then recreates the table.

I plan on dropping a table manually instead of using "to_sql"

That's a good solution 👍 ! I also made a gist describing the problem and giving solutions.

I'll close this issue now but if there is anything else please let me know 👍 .

@ThibTrip ThibTrip closed this as completed Feb 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants