Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duckdb performance improvement #151

Open
wants to merge 2 commits into
base: add-duckdb
Choose a base branch
from

Conversation

wjjmjh
Copy link
Member

@wjjmjh wjjmjh commented Aug 18, 2024

No description provided.

@EbiArnie
Copy link

I would recommend to use dataframes to append the data, then copy the dataframe into the DB. This should be much faster.
An example:

import pandas as pd
import duckdb
import numpy as np

COUNT = 1000000
d = {'col1': np.arange(0, COUNT, 1, dtype=int), 'col2': np.random.randint(100, size=COUNT)}
df = pd.DataFrame(data=d)

con = duckdb.connect()

# register the df as a view if we want to use it in a SQL stmt
con.register("test_df", df)

con.execute("CREATE TABLE my_table (col1 INTEGER, col2 int, col3 int, col4 int)")

# Use append if columns match, else use SQL INSERT
# con.append("my_table", df)
con.execute("INSERT INTO my_table SELECT *, 0, 0 FROM test_df")
                                                                                                                                                                                                                                                            con.sql("select * from my_table").show()

@GavinHuttley
Copy link

hi @wjjmjh , just letting you know I'm thinking it may be more fruitful to define a plugin architecture that simplifies developing different middleware solutions. I'll leave this open until I have that design nailed down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants