Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect and view results for user-terminated jobs #178

Closed
seagate-bt opened this issue Apr 21, 2014 · 2 comments
Closed

Collect and view results for user-terminated jobs #178

seagate-bt opened this issue Apr 21, 2014 · 2 comments
Labels
Milestone

Comments

@seagate-bt
Copy link

tl;dr
It would be great if you could select to prematurely complete a job such that the data is still collected and reported.

Currently with COSBench, after a job has completed, the results from the drivers and collected and reported, such as Bandwith, Response-Time, Success-Ratio, etc. If a job is terminated during the "main" phase, then the status shows as terminated and no data is collected and reported.
Hypothetical Scenario: You have launched a long-running 20-hr job in COSBench to prove a new Swift cluster, for example. What if you are at the 18th-hour and the network has to be restarted to install critical patches, say for the Heartbleed OpenSSL bug, and they will not wait for 2 more hours for the job to complete. From the GUI, I select the Job and click a button to prematurely complete the job. It gracefully stops the work on the Drivers and gathers the data on the Controller for viewing. I can then see the BW, RespTime, Succ-Rat, etc. for the 18 of the 20 hours. The status does not need to say "Success", instead perhaps "Conditional-Success" or "Provisional-Success" or "Terminated-Success" to reflect that the job as submitted did not complete, but it was not an error that caused the job to terminate but directed by the operator.

I don't know if this is feasible, to communicate with the Drivers in this fashion. Mainly, I would like the ability to still view the statistical data for jobs that do not complete.

@ywang19
Copy link
Contributor

ywang19 commented Apr 21, 2014

it's certainly feasible, actually, controller is already got data points before the failure time, and we currently just discard them. will consider to support it. btw, v0.4.0 beta2 will be uploaded this week, which includes one fix to avoid termination at long run, you may try it before this issue is resolved.

@ywang19 ywang19 added this to the 0.4.1 milestone Jun 30, 2014
@ywang19 ywang19 added 3 - Done and removed 3 - Done labels Jul 2, 2014
@ywang19
Copy link
Contributor

ywang19 commented Jul 2, 2014

it's already included in latest v0.4.0.1 code base.

@ywang19 ywang19 closed this as completed Jul 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants