Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource usage and time metrics #370

Closed
m-mohr opened this issue Apr 2, 2021 · 13 comments · Fixed by #383
Closed

Resource usage and time metrics #370

m-mohr opened this issue Apr 2, 2021 · 13 comments · Fixed by #383

Comments

@m-mohr
Copy link
Member

m-mohr commented Apr 2, 2021

Could be done via:

  • .../logs? (works for all processing modes) - note: for sync. processing via Header
  • via the resource details (GET /jobs/:id and GET /services/:id)? (doesn't work for sync. processing)
  • via a new endpoint?

Could include:

  • total batch job duration (wall time)
  • total batch job cpu hours
  • total batch job memory hours

These metrics can be visualized in a web interface, which should give the user an idea of scalability.

@m-mohr m-mohr added the platform label Apr 2, 2021
@m-mohr m-mohr changed the title User visible overview of processing time metrics Overview of processing time metrics Apr 9, 2021
@m-mohr m-mohr added this to the 1.1.0 milestone Apr 9, 2021
@m-mohr
Copy link
Member Author

m-mohr commented Apr 9, 2021

Use Case:

The API shall provide capabilities for visual process monitoring in the distributed compute environment. Note: This shall provide relevant information through graphs/dashboards on e.g. orchestration, data throughput, CPU workloads, etc. so that users can optimise the efficiency of their code and thus reduce the costs required when switching to bulk processing. (req. 60)

@soxofaan
Copy link
Member

.../logs? (works for all processing modes)

.../logs is not available for sync processing, only batch and services, right?

Another solution could be custom response headers. This is probably the only option for sync processing (unless you make sync processing stateful).

For batch and services, something like /logs is probably a lot cleaner than custom response headers.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 12, 2021

That was not clearly described: For synchronous processing, it would just be added to the log file that can be returned via the header, which is the equivalent to /logs in batch/services.

Sync. processing is stateful anyway if you implement either billing (which most services will do) or return log "files" via the header.

The advantage of log "files" would be that it can also do intermediate time metrics, e.g. after each process. That would certainly help to find out bottlenecks. That would be a reason against custom headers, especially as you'll often use sync processing for debugging and trying things out on smaller chunks, where you'd probably would like to get the metrics as detailed as possible.

So overall, it seems like the best option would indeed be ".../logs" (batch/services) / log-header (sync).

@m-mohr
Copy link
Member Author

m-mohr commented Apr 15, 2021

This is the last issue for API v1.1.0 right now that doesn't have a corresponding PR.
Before starting a PR, I'd need feedback on two details:

  1. Where to specify the metrics (see above)? It seems that there's a slight tendency towards the logs, which I think I'm also in favor of.
  2. What metrics to specify and which units do they have. Some examples are also mentioned above. This may need some investigation what software actually supports...

@m-mohr
Copy link
Member Author

m-mohr commented Apr 16, 2021

Instead of specifying a fixed unit for each metric, we could also make it self-descriptive:

{
	"metrics": {
		"cpu": {
			"value": 123,
			"unit": "cpu-seconds" // default
		},
		"memory": {
			"value": 123,
			"unit": "mb-seconds" / "gb-hours" / ... // default: ?
		},
		"time": {
			"value": 123,
			"unit": "seconds" / "minutes" / "hours" // default: minutes?
		},
		"network": {
			"value": 123,
			"unit": "b" / "kb" / ... / "tb" // default: mb?
		},
		"storage": {
			"value": 123,
			"unit": "b" / "kb" / ... / "tb" // default: mb?
		}
	}
}

see PR: #383

m-mohr added a commit that referenced this issue Apr 16, 2021
m-mohr added a commit that referenced this issue Apr 16, 2021
@m-mohr m-mohr linked a pull request Apr 16, 2021 that will close this issue
@m-mohr m-mohr changed the title Overview of processing time metrics Resource usage and time metrics Apr 16, 2021
@jdries
Copy link

jdries commented Apr 16, 2021

I'll try to figure out if accessing to /logs can be made user friendly enough. Maybe we can even do both, using /logs for all kinds of advanced use cases, and the same 'usage' properties inside the job metadata, where users can immediately see them?
Keep in mind that this information will also be the basis for accounting, so if we log usage for different steps, we also need a way to find the total usage, in an unambiguous way.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 16, 2021

@jdries

I'll try to figure out if accessing to /logs can be made user friendly enough.

In the Web Editor it's implemented already with a UI. That could also be integrated into Jupyter.

Maybe we can even do both, using /logs for all kinds of advanced use cases, and the same 'usage' properties inside the job metadata, where users can immediately see them?

That is actually what PR #383 proposes.

Keep in mind that this information will also be the basis for accounting, so if we log usage for different steps, we also need a way to find the total usage, in an unambiguous way.

But that seems to be a back-end / implementation issue?! Nevertheless, should be solved in the mentioned PR.

@jdries
Copy link

jdries commented Apr 19, 2021

Apologies, I see it now indeed in the diff, so that seems reasonable.
Total usage needs to be unambiguous for the users as well, as it is the basis for their bill.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 19, 2021

Total usage needs to be unambiguous for the users as well, as it is the basis for their bill.

@jdries I understand that this is important, but I'm not sure whether this is just a general remark or you'd want something specific to be changed/added/... in the current proposal? Is the proposal unambiguous?

@jdries
Copy link

jdries commented Apr 19, 2021

Was just a general remark, no changes needed!

@kempenep
Copy link

Shouldn't the units of memory and network be exchanged ?

@m-mohr
Copy link
Member Author

m-mohr commented Apr 22, 2021

@kempenep I don't think so, I've usually only seen those units in use. If you have more details on where this is handled differently, please let me know.

@m-mohr
Copy link
Member Author

m-mohr commented May 5, 2021

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants