Duckdb performance improvement #151

wjjmjh · 2024-08-18T14:24:18Z

No description provided.

EbiArnie · 2024-08-20T15:02:02Z

I would recommend to use dataframes to append the data, then copy the dataframe into the DB. This should be much faster.
An example:

import pandas as pd
import duckdb
import numpy as np

COUNT = 1000000
d = {'col1': np.arange(0, COUNT, 1, dtype=int), 'col2': np.random.randint(100, size=COUNT)}
df = pd.DataFrame(data=d)

con = duckdb.connect()

# register the df as a view if we want to use it in a SQL stmt
con.register("test_df", df)

con.execute("CREATE TABLE my_table (col1 INTEGER, col2 int, col3 int, col4 int)")

# Use append if columns match, else use SQL INSERT
# con.append("my_table", df)
con.execute("INSERT INTO my_table SELECT *, 0, 0 FROM test_df")
                                                                                                                                                                                                                                                            con.sql("select * from my_table").show()

GavinHuttley · 2024-10-04T22:41:45Z

hi @wjjmjh , just letting you know I'm thinking it may be more fruitful to define a plugin architecture that simplifies developing different middleware solutions. I'll leave this open until I have that design nailed down.

wjjmjh added 2 commits August 19, 2024 00:04

ENH: _add_features of EnsemblGffDuckDb

edde396

ENH: add_records of EnsemblGffDuckDb

9e49ab2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duckdb performance improvement #151

Duckdb performance improvement #151

wjjmjh commented Aug 18, 2024

EbiArnie commented Aug 20, 2024

GavinHuttley commented Oct 4, 2024

Duckdb performance improvement #151

Are you sure you want to change the base?

Duckdb performance improvement #151

Conversation

wjjmjh commented Aug 18, 2024

EbiArnie commented Aug 20, 2024

GavinHuttley commented Oct 4, 2024