Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable highWaterMark for streaming #558

Open
APTy opened this issue Jun 29, 2023 · 9 comments
Open

Configurable highWaterMark for streaming #558

APTy opened this issue Jun 29, 2023 · 9 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@APTy
Copy link
Contributor

APTy commented Jun 29, 2023

Would there be a way to set the highWaterMark for streams? The improvements in #43 are great, but the default of 10 feels a bit low, and would be nice to be able to tweak that value a bit to find something that works for each use case to get optimal throughput.

@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Jun 30, 2023
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Jun 30, 2023
@sfc-gh-dszmolka
Copy link
Collaborator

hi and thank you for raising this issue! rowStreamHighWaterMark seems to be there for this, could you please try setting it in Connection and see if it works for you ? example:

var connection = snowflake.createConnection({
    account: account,
    username: user,
..
    rowStreamHighWaterMark = 20 // default=10
  });

@sfc-gh-dszmolka sfc-gh-dszmolka added status-information_needed Additional information is required from the reporter question Issue is a usage/other question rather than a bug and removed status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter labels Jun 30, 2023
@APTy
Copy link
Contributor Author

APTy commented Jun 30, 2023

Thanks for the quick response! In my experience, it doesn't seem that this parameter is actually configurable. This is confirmed by @sfc-gh-ext-simba-lf here in #505.

I'd be happy to open a PR to add external: true to this param so that users can configure it, if that's something the team wants to expose to users.

@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Jun 30, 2023

ah, missed that - thank you for pointing it out. definitely something to discuss, i don't see a reason why we would not want to expose it, but i'm checking with the team.

edit: of course the PR is welcome but first I want to make sure we can/want actually expose it. missing the part why we didn't do so in the first place, maybe there was a reason I'm not aware.

@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature status-in_progress Issue is worked on by the driver team and removed question Issue is a usage/other question rather than a bug labels Jun 30, 2023
@APTy
Copy link
Contributor Author

APTy commented Jun 30, 2023

Appreciate it! If we indeed want to make it configurable, I guess there's a question of:

  • Should it be configured on the connection (createConnection()) or the session (conn.execute()) or the result stream (stmt.streamRows())?
  • Or something different entirely, like the CLIENT_RESULT_CHUNK_SIZE param?

(Yes - we may not want to actually expose it, but then it'd be good to know whether 10 is indeed an optimal value, or whether some performance is being left on the table because of it)

@APTy
Copy link
Contributor Author

APTy commented Jul 12, 2023

Hey @sfc-gh-dszmolka thanks for the help so far, just checking whether there's been any movement internally on this one yet 😄

@sfc-gh-dszmolka
Copy link
Collaborator

hi @APTy at this moment nothing to share but will keep this updated once i hear any interesting back. We're also considering a more systematic approach of reviewing all the currently internal parameters to see if more of them could be made external, as some of them seem useful to have control over for the user. Thank you for bearing with us !

@sfc-gh-dszmolka sfc-gh-dszmolka removed the status-in_progress Issue is worked on by the driver team label Dec 28, 2023
@cleve-fauna
Copy link

cleve-fauna commented Feb 10, 2024

Any update on this?

The current lack of configuration on streams makes simpler code suffer in performance - and we end up doing manual batching using the start + end params instead (so that everything comes into memory at once).

Getting more control over the streaming parameters would allows to write much simpler code such as:

let batch = [];
await (for row of snowflake.streamRows()) {
   batch.push(row);
   if (batch.length === batchSize) {
       // do something
       batch = [];
   }
}

if (batch.length > 0) {
   //doSomething
}

But right now we can't use this pattern because the performance is too slow.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Feb 11, 2024
@sfc-gh-dszmolka
Copy link
Collaborator

At this moment I don't have any updates on this, but will keep this thread posted as soon as there's any. Thank you for bearing with us !

@sfc-gh-dszmolka
Copy link
Collaborator

After syncing internally with the team, it looks like now this is planned to be finished by end of Q1 (April 2024) but likely sooner than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants