Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with ArrowStream streaming #22673

Merged
merged 2 commits into from
Apr 12, 2021
Merged

Experiment with ArrowStream streaming #22673

merged 2 commits into from
Apr 12, 2021

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Apr 5, 2021

POC for fixing ArrowStream reading when source is not a local file.

Before:

ninja clickhouse-local && programs/clickhouse-local -q "SELECT count(), min(number), max(number) FROM url('http://127.0.0.1:3000/inf', "ArrowStream", 'number UInt64') FORMAT JSON" --logger.level trace
<Information> executeQuery: Read 100000000 rows, 762.94 MiB in 9.683175 sec., 10327191 rows/sec., 78.79 MiB/sec.
**<Debug> MemoryTracker: Peak memory usage (for query): 2.25 Gi**

After:

<Information> executeQuery: Read 100000000 rows, 762.94 MiB in 6.701079 sec., 14922969 rows/sec., 113.85 MiB/sec.
**<Debug> MemoryTracker: Peak memory usage (for query): 4.59 MiB.**

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve performance of reading from ArrowStream input format for sources other then local file (e.g. URL).

@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Apr 5, 2021
@nvartolomei nvartolomei marked this pull request as ready for review April 6, 2021 20:52
Copy link
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KochetovNicolai
Copy link
Member

zless clickhouse-server.log.gz | grep -a Fatal
2021.04.08 01:47:41.131832 [ 295 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).

@KochetovNicolai KochetovNicolai merged commit 7019a9a into ClickHouse:master Apr 12, 2021
@nvartolomei nvartolomei deleted the nv/exp-arrow-stream branch April 12, 2021 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants