Issue with "update" and "ignore" in Pangres #28

BrandonRisley · 2021-01-25T15:44:23Z

Hello! First off, thank you so much for the pangres package!

In the past two weeks, the "update" and "ignore" options for the "if_row_exists" parameter seemed to work as intended. However, I recently tried to use both of these features and did not experience the same results as before. Instead, it appears that the function is just appending data to a MYSQL table instead of checking for the unique primary key before inserting/updating/ignoring. I double checked my primary keys and from what I can tell, all of the primary keys match exactly.

Is this an issue for anyone else?

Code Snippet:

df.drop_duplicates(subset=['item_id'], inplace=True)
df.set_index('item_id', inplace=True)

table_name = 'table'

dtype = {'item_id':VARCHAR(length=200)}

pangres.upsert(engine=engine,
df=df,
table_name=table_name,
if_row_exists='ignore', # update or ignore
dtype = dtype)

Some details:
pangres: 2.2.3
Python : 3.8
MYSQL: 8.0
System: Mac OS
SQLAlchemy : 1.3.18
PyMySQL: 0.10.1

BrandonRisley · 2021-01-25T18:36:59Z

Update:

It appears that the issue is not with Pangres itself but when used in conjunction with Pandas "to_sql". For my purposes, I used one or the other to update a table. When I wanted to update or ignore rows, I would use Pangres, but when I wanted to restructure the entire table, I would use "to_sql" with "replace" as one of the parameters. On a new MYSQL table where "to_sql" has never been used, Pangres works exactly as it should. For some reason I'm unaware of, if a table has used "to_sql" at some point in the table construction/manipulation process, Pangres cannot recognize the primary keys even though the string is exactly the same. Is there a reason for this? Moving forward, I plan to just use Pangres for my needs, but it would be interesting to know why this issue occurs.

ThibTrip · 2021-01-25T19:58:19Z

Hello @BrandonRisley ! I can't investigate it right now. Have a big project coming up on Wednesday evening.

I can already tell you that using to_sql with replace will drop and then recreate the table without the primary key. If pangres still works after that it could be a bug.

Indeed pangres uses SQL statements such as INSERT ... ON CONFLICT (column(s)) DO NOTHING where column(s) must be a primary key or a unique key. I don't think I made tests to see if such SQL statements will work if column(s) is not a primary/unique key.

If you want to use to_sql in conjunction with pangres you could follow the to_sql statement with an ALTER statement to add a primary key. Or you could drop the table and let pangres recreate it for you (it would be the same workflow as in pandas: drop->create). I suppose I can post an example later 🤔, I am still working now.

BrandonRisley · 2021-01-25T20:07:51Z

@ThibTrip No problem! Thank you for the info and your help! I plan on dropping a table manually instead of using "to_sql" , so this will not be an issue in the future.

ThibTrip · 2021-02-07T15:34:31Z

Hello again @BrandonRisley ! Sorry I was very busy and forgot your issue 😿 !

After rereading what you wrote it seems obvious to me now. The problem is not linked to pangres and I'd say it's not really linked to df.to_sql either.

Pangres cannot recognize the primary keys even though the string is exactly the same

Yep it's normal because you lost columns specifications when using df.to_sql(if_exists='replace') which drops then recreates the table.

I plan on dropping a table manually instead of using "to_sql"

That's a good solution 👍 ! I also made a gist describing the problem and giving solutions.

I'll close this issue now but if there is anything else please let me know 👍 .

ThibTrip closed this as completed Feb 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with "update" and "ignore" in Pangres #28

Issue with "update" and "ignore" in Pangres #28

BrandonRisley commented Jan 25, 2021

BrandonRisley commented Jan 25, 2021

ThibTrip commented Jan 25, 2021

BrandonRisley commented Jan 25, 2021

ThibTrip commented Feb 7, 2021

Issue with "update" and "ignore" in Pangres #28

Issue with "update" and "ignore" in Pangres #28

Comments

BrandonRisley commented Jan 25, 2021

BrandonRisley commented Jan 25, 2021

ThibTrip commented Jan 25, 2021

BrandonRisley commented Jan 25, 2021

ThibTrip commented Feb 7, 2021