Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add io_time for reading parquet files #31

Closed
wants to merge 3 commits into from
Closed

Add io_time for reading parquet files #31

wants to merge 3 commits into from

Conversation

mpurins-coralogix
Copy link
Collaborator

Add io_time metric for ParquetExec.

It is enabled only for debug builds and works only when sync_chunk_reader performs IO inside sync_chunk_reader and returns Read object which returns bytes from memory. This is true for our S3FileReader and S3BufferedFileReader, but other ObjectReader might perform IO when read is called on returned Read object (which for example is true for LocalFileReader).

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #4340: Measure IO performance.

@mpurins-coralogix mpurins-coralogix deleted the sc-4340 branch May 9, 2022 11:30
thinkharderdev pushed a commit that referenced this pull request Feb 14, 2023
* Initial commit

* initial commit

* failing test

* table scan projection

* closer

* test passes, with some hacks

* use DataFrame (#2)

* update README

* update dependency

* code cleanup (#3)

* Add support for Filter operator and BinaryOp expressions (#4)

* GitHub action (#5)

* Split code into producer and consumer modules (#6)

* Support more functions and scalar types (#7)

* Use substrait 0.1 and datafusion 8.0 (#8)

* use substrait 0.1

* use datafusion 8.0

* update datafusion to 10.0 and substrait to 0.2 (#11)

* Add basic join support (#12)

* Added fetch support (#23)

Added fetch to consumer

Added limit to producer

Added unit tests for limit

Added roundtrip_fill_none() for testing when None input can be converted to 0

Update src/consumer.rs

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* Upgrade to DataFusion 13.0.0 (#25)

* Add sort consumer and producer (#24)

Add consumer

Add producer and test

Modified error string

* Add serializer/deserializer (#26)

* Add plan and function extension support (#27)

* Add plan and function extension support

* Removed unwraps

* Implement GROUP BY (#28)

* Add consumer, producer and tests for aggregate relation

Change function extension registration from absolute to relative anchor
(reference)

Remove operator to/from reference

* Fixed function registration bug

* Add test

* Addressed PR comments

* Changed field reference from mask to direct reference (#29)

* Changed field reference from masked reference to direct reference

* Handle unsupported case (struct with child)

* Handle SubqueryAlias (#30)

Fixed aggregate function register bug

* Add support for SELECT DISTINCT (#31)

Add test case

* Implement BETWEEN (#32)

* Add case (#33)

* Implement CASE WHEN

* Add more case to test

* Addressed comments

* feat: support explicit catalog/schema names in ReadRel (#34)

* feat: support explicit catalog/schema names in ReadRel

Signed-off-by: Ruihang Xia <[email protected]>

* fix: use re-exported expr crate

Signed-off-by: Ruihang Xia <[email protected]>

Signed-off-by: Ruihang Xia <[email protected]>

* move files to subfolder

* RAT

* remove rust.yaml

* revert .gitignore changes

* tomlfmt

* tomlfmt

Signed-off-by: Ruihang Xia <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: JanKaul <[email protected]>
Co-authored-by: nseekhao <[email protected]>
Co-authored-by: Ruihang Xia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant