Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustered execution interval #60

Open
ari opened this issue May 5, 2018 · 3 comments
Open

Clustered execution interval #60

ari opened this issue May 5, 2018 · 3 comments

Comments

@ari
Copy link

ari commented May 5, 2018

If I'm reading the code correctly, Zookeeper is used for locking (preventing two executions at once) but not for synchronising the execution timing.

So if I had a cluster of 5 applications, with a scheduled event once an hour, each application will try to run it once an hour and we'd get executions every 12 minutes on average.

A possible solution might be to store the last run timestamp in the ZK node, and make persist that node between executions.

@andrus
Copy link
Contributor

andrus commented May 5, 2018

You are right that ZK is used here for locking only, not for centralized scheduling. Unless the clocks are not in sync between the cluster nodes or the jobs finish really quickly, "losing" nodes would simply abandon the run till the next scheduled event if they fail to obtain a lock within a short timeout. So in practice the job will still run once an hour.

Having said that, the current situation is not ideal and we are looking at alt architectures, one being a single centralized scheduler and job dispatching done via an event queue to the clustered "agent" nodes.

@ari
Copy link
Author

ari commented May 5, 2018

If the jobs are on a fixedDelay rather than cron, then when they run will depend on when each app is started. But yes, a cron approach with good clock sync should be better in this case. I hadn't realised till now that this project supported that too.

A centralised scheduler adds a new single point of failure though. So using zookeeper as a single shared lock and timestamp might be a simpler solution, no?

@andrus
Copy link
Contributor

andrus commented May 5, 2018

Yeah, ZK can be used either for leader election for the scheduler (to avoid a single point of failure) or as an execution tracking mechanism (your timestamp suggestion ... I guess it will require a bit of fuzziness when a job decides whether to run or not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants