Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Coordination Improvement Proposal #761

Closed
pyalex opened this issue Jun 1, 2020 · 1 comment · Fixed by #792
Closed

Job Coordination Improvement Proposal #761

pyalex opened this issue Jun 1, 2020 · 1 comment · Fixed by #792

Comments

@pyalex
Copy link
Collaborator

pyalex commented Jun 1, 2020

Is your feature request related to a problem? Please describe.

There's some technical debt in ingestion part mostly related to Job Coordination: job is being restarted on schema update, this restart can take arbitrary amount of time, hence we had to split jobs as much as possible to minimize mutual disruption (on restart). This negatively affects our UX and also restrict us from using Dataflow resources more efficiently, which currently leads to very high spendings on Dataflow.

Describe the solution you'd like

I drafter proposal that addresses those issues and suggest new design in communications between core service and ingestion pipeline
https://docs.google.com/document/d/1gqkCWZUyVBIU8OKhxIhIf1BBd3JrbOx2WnXrpGLFCAc/edit#heading=h.wigpvke4im4g

@woop
Copy link
Member

woop commented Jun 1, 2020

Might be relevant to the following folks. @ches @Yanson @dr3s @idahoakl

TL;DR: We want to reduce the responsibility of the Job Coordinator to spin up new jobs when new feature sets are registered (or changed). The existing jobs should accommodate changes to the schemas.

Please flag any concerns with this approach if you have any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants