-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(uploads): respect db engine spec's supports_multivalues_insert value for file uploads & enable multi-insert for MSSQL #30222
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #30222 +/- ##
===========================================
+ Coverage 60.48% 83.69% +23.20%
===========================================
Files 1931 529 -1402
Lines 76236 38420 -37816
Branches 8568 0 -8568
===========================================
- Hits 46114 32155 -13959
+ Misses 28017 6265 -21752
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but all db engine spec class properites need to be also added to BaseEngineSpec
with conservative defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM but could be simplified more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, really stoked about this PR, LGTM! 🚀
…lue for file uploads & enable multi-insert for MSSQL (#30222) Co-authored-by: Ville Brofeldt <[email protected]> (cherry picked from commit f8a7753)
SUMMARY
This PR makes two changes to enable the setting of
method = 'multi'
when the Pandas function.to_sql
is called to write a file upload to a data warehouse. This speeds up file uploads by 15x. See discussion #30152 for background and discussion.Changes:
df_to_sql
does not currently access thesupports_multivalues_insert
attribute when it is set in a Superset EngineSpec. Instead it relies on this attribute being set in the SQLAlchemy implementation for that dialect. After this PR, Superset will check both of these places for the attribute.supports_multivalues_insert
toTrue
for Microsoft SQL Server (MSSQL), which has supported this behavior since SQL Server 2008.Thanks to @nytai for advising.
TESTING INSTRUCTIONS
Check that uploading a flat file to a data warehouse succeeds. I tested it for MSSQL. This PR was written to not reduce any existing functionality (there's an
or
with the previous condition) but someone could check on another data warehouse too.ADDITIONAL INFORMATION