Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: temporary branch for IOx update (11-30-2023 to 12-09-2023) #8543

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
2092b49
chore: rebase to 513fd052bdbf5c7a73de544a876961f780b90a92
appletreeisyellow Dec 14, 2023
26b1e13
Fix regression with Incorrect results when reading parquet files with…
alamb Dec 14, 2023
e4662a1
feat: support `LargeList` in `array_empty` (#8321)
Weijun-H Nov 30, 2023
9ed28ae
Double type argument for to_timestamp function (#8159)
spaydar Nov 30, 2023
265d2da
Support User Defined Table Function (#8306)
Veeupup Nov 30, 2023
ee60e7e
Document timestamp input limits (#8369)
comphead Dec 1, 2023
b63f403
fix: make `ntile` work in some corner cases (#8371)
haohuaijin Dec 1, 2023
2d86c7f
Refactor array_union function to use a generic (#8381)
Weijun-H Dec 1, 2023
3c50b7c
Refactor function argument handling in (#8387)
Weijun-H Dec 1, 2023
4b130b9
Materialize dictionaries in group keys (#7647) (#8291)
qrilka Dec 1, 2023
c7c83b2
Rewrite `array_ndims` to fix List(Null) handling (#8320)
jayzhan211 Dec 1, 2023
89d9e1a
Docs: Improve the documentation on `ScalarValue` (#8378)
alamb Dec 2, 2023
fed4977
Avoid concat for `array_replace` (#8337)
jayzhan211 Dec 2, 2023
d214ebe
add a summary table to benchmark compare output (#8399)
razeghi71 Dec 2, 2023
6d93a85
Refactors on TreeNode Implementations (#8395)
berkaysynnada Dec 2, 2023
372da21
feat: support `LargeList` in `make_array` and `array_length` (#8121)
Weijun-H Dec 3, 2023
652dab0
remove `unalias()` TableScan filters when create Physical Filter (#8404)
jackwener Dec 3, 2023
61d8194
Update custom-table-providers.md (#8409)
nickpoorman Dec 4, 2023
8bbc7ff
fix transforming `LogicalPlan::Explain` use `TreeNode::transform` fai…
haohuaijin Dec 4, 2023
ad59f5e
Docs: Fix `array_except` documentation example (#8407)
Asura7969 Dec 4, 2023
8875817
Support named query parameters (#8384)
Asura7969 Dec 4, 2023
77e693e
Minor: Add installation link to README.md (#8389)
Weijun-H Dec 4, 2023
b77b2bf
Update code comment for the cases of regularized RANGE frame and add …
viirya Dec 4, 2023
d044deb
Minor: Add example with parameters to LogicalPlan (#8418)
alamb Dec 5, 2023
684742a
Minor: Improve `PruningPredicate` documentation (#8394)
alamb Dec 5, 2023
9a4fefa
feat: ScalarValue from String (#8411)
QuenKar Dec 5, 2023
ebf7f85
Bump actions/labeler from 4.3.0 to 5.0.0 (#8422)
dependabot[bot] Dec 5, 2023
11c5bb8
Update sqlparser requirement from 0.39.0 to 0.40.0 (#8338)
dependabot[bot] Dec 5, 2023
5c0c619
feat: support `LargeList` for `array_has`, `array_has_all` and `array…
Weijun-H Dec 5, 2023
584bc3c
Union `schema` can't be a subset of the child schema (#8408)
jackwener Dec 5, 2023
6dd353e
Move `PartitionSearchMode` into datafusion_physical_plan, rename to `…
alamb Dec 5, 2023
5b20810
Make filter selectivity for statistics configurable (#8243)
edmondop Dec 5, 2023
d995ae9
fix: Changed labeler.yml to latest format (#8431)
viirya Dec 6, 2023
da6efaf
Minor: Use `ScalarValue::from` impl for strings (#8429)
alamb Dec 6, 2023
e97b2f7
Support crossjoin in substrait. (#8427)
my-vegetable-has-exploded Dec 6, 2023
4679f60
Fix ambiguous reference when aliasing in combination with `ORDER BY` …
Asura7969 Dec 6, 2023
8e8ae88
Minor: convert marcro `list-slice` and `slice` to function (#8424)
Weijun-H Dec 6, 2023
9550121
Remove macro in iter_to_array for List (#8414)
jayzhan211 Dec 6, 2023
0cd615c
fix: Literal in `ORDER BY` window definition should not be an ordinal…
viirya Dec 6, 2023
733b8ef
feat: customize column default values for external tables (#8415)
jonahgao Dec 6, 2023
34b2b3c
feat: Support `array_sort`(`list_sort`) (#8279)
Asura7969 Dec 6, 2023
18f8149
Bugfix: Remove df-cli specific SQL statment options before executing …
devinjdangelo Dec 6, 2023
304d7ae
Detect when filters make subqueries scalar (#8312)
Jesse-Bakker Dec 6, 2023
608bbb2
Add alias check to optimize projections merge (#8438)
mustafasrepo Dec 7, 2023
0177434
Fix PartialOrd for ScalarValue::List/FixSizeList/LargeList (#8253)
jayzhan211 Dec 7, 2023
d409d07
Support parquet_metadata for datafusion-cli (#8413)
Veeupup Dec 7, 2023
5f69fc7
Fix bug in optimizing a nested count (#8459)
Dandandan Dec 7, 2023
bcb8fd8
Bump actions/setup-python from 4 to 5 (#8449)
dependabot[bot] Dec 7, 2023
4482af6
fix: ORDER BY window definition should work on null (#8444)
viirya Dec 7, 2023
4dde70e
flx clippy warnings (#8455)
waynexia Dec 8, 2023
4cbd1c5
fix: RANGE frame for corner cases with empty ORDER BY clause should b…
viirya Dec 8, 2023
a6e5c64
Preserve `dict_id` on `Field` during serde roundtrip (#8457)
avantgardnerio Dec 8, 2023
665c068
support inter leave node (#8460)
liukun4515 Dec 8, 2023
b5c0c60
Not fail when window input is empty record batch (#8466)
mustafasrepo Dec 8, 2023
6f295db
update cast (#8458)
Weijun-H Dec 8, 2023
aa07f26
fix: don't unifies projection if expr is non-trival (#8454)
haohuaijin Dec 8, 2023
ea6ab10
Minor: Add new bloom filter predicate tests (#8433)
alamb Dec 8, 2023
ebdc7da
Add PRIMARY KEY Aggregate support to dataframe API (#8356)
mustafasrepo Dec 8, 2023
ddfa774
Minor: refactor `data_trunc` to reduce duplicated code (#8430)
Weijun-H Dec 8, 2023
cdf2cdb
Support array_distinct function. (#8268)
my-vegetable-has-exploded Dec 8, 2023
6991435
Add primary key support to stream table (#8467)
mustafasrepo Dec 8, 2023
2a94ba6
Add `evaluate_demo` and `range_analysis_demo` to Expr examples (#8377)
alamb Dec 8, 2023
216d2d6
fix typo (#8473)
Weijun-H Dec 8, 2023
ee84e15
Fix comment typo in table.rs: s/indentical/identical/ (#8469)
KeunwooLee-at Dec 8, 2023
79da181
Remove `define_array_slice` and reuse `array_slice` for `array_pop_fr…
jayzhan211 Dec 9, 2023
834da74
Minor: refactor `trim` to clean up duplicated code (#8434)
Weijun-H Dec 9, 2023
fc7154e
Split `EmptyExec` into `PlaceholderRowExec` (#8446)
razeghi71 Dec 9, 2023
a9e50c8
Fix group by aliased expression in LogicalPLanBuilder::aggregate (#8629)
alamb Dec 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Audit licenses
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dev_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
github.event_name == 'pull_request_target' &&
(github.event.action == 'opened' ||
github.event.action == 'synchronize')
uses: actions/labeler@v4.3.0
uses: actions/labeler@v5.0.0
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
configuration-path: .github/workflows/dev_pr/labeler.yml
Expand Down
34 changes: 18 additions & 16 deletions .github/workflows/dev_pr/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,35 +16,37 @@
# under the License.

development-process:
- dev/**.*
- .github/**.*
- ci/**.*
- .asf.yaml
- changed-files:
- any-glob-to-any-file: ['dev/**.*', '.github/**.*', 'ci/**.*', '.asf.yaml']

documentation:
- docs/**.*
- README.md
- ./**/README.md
- DEVELOPERS.md
- datafusion/docs/**.*
- changed-files:
- any-glob-to-any-file: ['docs/**.*', 'README.md', './**/README.md', 'DEVELOPERS.md', 'datafusion/docs/**.*']

sql:
- datafusion/sql/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/sql/**/*']

logical-expr:
- datafusion/expr/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/expr/**/*']

physical-expr:
- datafusion/physical-expr/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/physical-expr/**/*']

optimizer:
- datafusion/optimizer/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/optimizer/**/*']

core:
- datafusion/core/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/core/**/*']

substrait:
- datafusion/substrait/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/substrait/**/*']

sqllogictest:
- datafusion/sqllogictest/**/*
- changed-files:
- any-glob-to-any-file: ['datafusion/sqllogictest/**/*']
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
path: asf-site

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ jobs:
- uses: actions/checkout@v4
with:
submodules: true
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: "3.8"
- name: Install PyArrow
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ parquet = { version = "49.0.0", default-features = false, features = ["arrow", "
rand = "0.8"
rstest = "0.18.0"
serde_json = "1"
sqlparser = { version = "0.39.0", features = ["visitor"] }
sqlparser = { version = "0.40.0", features = ["visitor"] }
tempfile = "3"
thiserror = "1.0.44"
chrono = { version = "0.4.31", default-features = false }
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ in-memory format. [Python Bindings](https://github.com/apache/arrow-datafusion-p
Here are links to some important information

- [Project Site](https://arrow.apache.org/datafusion)
- [Installation](https://arrow.apache.org/datafusion/user-guide/cli.html#installation)
- [Rust Getting Started](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
- [Rust DataFrame API](https://arrow.apache.org/datafusion/user-guide/dataframe.html)
- [Rust API docs](https://docs.rs/datafusion/latest/datafusion)
Expand Down
39 changes: 34 additions & 5 deletions benchmarks/compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ def compare(
noise_threshold: float,
) -> None:
baseline = BenchmarkRun.load_from_file(baseline_path)

comparison = BenchmarkRun.load_from_file(comparison_path)

console = Console()
Expand All @@ -124,27 +123,57 @@ def compare(
table.add_column(comparison_header, justify="right", style="dim")
table.add_column("Change", justify="right", style="dim")

faster_count = 0
slower_count = 0
no_change_count = 0
total_baseline_time = 0
total_comparison_time = 0

for baseline_result, comparison_result in zip(baseline.queries, comparison.queries):
assert baseline_result.query == comparison_result.query

total_baseline_time += baseline_result.execution_time
total_comparison_time += comparison_result.execution_time

change = comparison_result.execution_time / baseline_result.execution_time

if (1.0 - noise_threshold) <= change <= (1.0 + noise_threshold):
change = "no change"
change_text = "no change"
no_change_count += 1
elif change < 1.0:
change = f"+{(1 / change):.2f}x faster"
change_text = f"+{(1 / change):.2f}x faster"
faster_count += 1
else:
change = f"{change:.2f}x slower"
change_text = f"{change:.2f}x slower"
slower_count += 1

table.add_row(
f"Q{baseline_result.query}",
f"{baseline_result.execution_time:.2f}ms",
f"{comparison_result.execution_time:.2f}ms",
change,
change_text,
)

console.print(table)

# Calculate averages
avg_baseline_time = total_baseline_time / len(baseline.queries)
avg_comparison_time = total_comparison_time / len(comparison.queries)

# Summary table
summary_table = Table(show_header=True, header_style="bold magenta")
summary_table.add_column("Benchmark Summary", justify="left", style="dim")
summary_table.add_column("", justify="right", style="dim")

summary_table.add_row(f"Total Time ({baseline_header})", f"{total_baseline_time:.2f}ms")
summary_table.add_row(f"Total Time ({comparison_header})", f"{total_comparison_time:.2f}ms")
summary_table.add_row(f"Average Time ({baseline_header})", f"{avg_baseline_time:.2f}ms")
summary_table.add_row(f"Average Time ({comparison_header})", f"{avg_comparison_time:.2f}ms")
summary_table.add_row("Queries Faster", str(faster_count))
summary_table.add_row("Queries Slower", str(slower_count))
summary_table.add_row("Queries with No Change", str(no_change_count))

console.print(summary_table)

def main() -> None:
parser = ArgumentParser()
Expand Down
Loading