Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Enable read_sql for swordfish #2918

Merged
merged 12 commits into from
Oct 8, 2024
Merged

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Sep 25, 2024

Enables read_sql tests for swordfish.

Additionally adds an EmptyScan operator. This is needed because some sql tests produce 0 scan tasks (some other IO tests that are skipped also need empty scan, so this PR will enable those to pass as well, I will unskip them in future PRs).

Technically, we don't need a dedicated empty scan operator for this, but it might be useful to be explicit. (For what it's worth the current runner does have an empty scan operator as well)

@github-actions github-actions bot added the chore label Sep 25, 2024
Copy link

codspeed-hq bot commented Sep 25, 2024

CodSpeed Performance Report

Merging #2918 will not alter performance

Comparing colin/swordfish-read-sql (df5d4d1) with main (64b8699)

Summary

✅ 17 untouched benchmarks

Copy link

codecov bot commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 50.94340% with 26 lines in your changes missing coverage. Please review.

Project coverage is 78.47%. Comparing base (3f37a69) to head (df5d4d1).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-local-execution/src/sources/scan_task.rs 0.00% 24 Missing ⚠️
src/daft-micropartition/src/python.rs 0.00% 1 Missing ⚠️
src/daft-physical-plan/src/local_plan.rs 87.50% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2918      +/-   ##
==========================================
+ Coverage   78.43%   78.47%   +0.03%     
==========================================
  Files         603      610       +7     
  Lines       71504    71742     +238     
==========================================
+ Hits        56086    56299     +213     
- Misses      15418    15443      +25     
Files with missing lines Coverage Δ
src/daft-local-execution/src/pipeline.rs 91.51% <100.00%> (+0.11%) ⬆️
src/daft-local-execution/src/sources/empty_scan.rs 100.00% <100.00%> (ø)
src/daft-physical-plan/src/translate.rs 91.95% <100.00%> (+0.18%) ⬆️
src/daft-micropartition/src/python.rs 76.74% <0.00%> (ø)
src/daft-physical-plan/src/local_plan.rs 85.80% <87.50%> (+0.09%) ⬆️
src/daft-local-execution/src/sources/scan_task.rs 81.95% <0.00%> (-9.36%) ⬇️

... and 36 files with indirect coverage changes

@colin-ho colin-ho marked this pull request as ready for review September 27, 2024 18:44
return Err(common_error::DaftError::TypeError(
"Database file format not implemented".to_string(),
));
FileFormatConfig::Database(common_file_formats::DatabaseSourceConfig { sql, conn }) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing read_sql code for the current executor is here:

FileFormatConfig::Database(DatabaseSourceConfig { sql, conn }) => {
let predicate = scan_task
.pushdowns
.filters
.as_ref()
.map(|p| (*p.as_ref()).clone().into());
Python::with_gil(|py| {
let table = crate::python::read_sql_into_py_table(
py,
sql,
conn,
predicate.clone(),
scan_task.schema.clone().into(),
scan_task
.pushdowns
.columns
.as_ref()
.map(|cols| cols.as_ref().clone()),
scan_task.pushdowns.limit,
)
.map(std::convert::Into::into)
.context(PyIOSnafu)?;
Ok(vec![table])
})?
}
, i just copied it for swordfish. The only difference is that a scan task for swordfish returns a stream, in the case for sql however, there's no way currently to stream results from a database read, so no choice but to wrap the result in a futures::stream::once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to leave a comment about this

Copy link
Contributor

@desmondcheongzx desmondcheongzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colin-ho colin-ho enabled auto-merge (squash) October 8, 2024 23:01
@colin-ho colin-ho merged commit 73ff3f3 into main Oct 8, 2024
39 checks passed
@colin-ho colin-ho deleted the colin/swordfish-read-sql branch October 8, 2024 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants