api: enable support for setting original job source #16763

shoenig · 2023-04-03T13:11:04Z

Note to reviewers: this PR isn't as large as it looks - most file changes are just due to UpsertJob getting a new signature, and that is used in many, many tests.

This PR adds support for setting job source material along with the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. This endpoint is limited by ACLs with read-job capability. It is quite similar to the JobVersions endpoint, in that data is upserted and removed along with a Job, but queryable through a separate endpoint.

The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the potentially huge string field throughout normal job operations.

The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped.

Will add e2e tests in a followup PR, this one is already too large.

No backports, this is for a 1.6 feature.

This PR adds support for setting job source material along with the registration of a job. This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the large string field throughout normal job operations. The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped.

schmichael

This is really exciting! Great work.

schmichael · 2023-04-06T21:11:45Z

api/jobs.go

+	// Variables contains the opaque variables configuration as coming from
+	// a var-file or the WebUI variables input (hcl2 only).
+	Variables string


Should we make this a map[string]string to preserve each filename and and its contents?

Since the parse endpoint is really only used by the webUI and the webUI's variable content is coming from a html form, I don't think it quite make sense to associate the content with a file name

schmichael · 2023-04-06T21:38:02Z

command/agent/job_endpoint.go

+// writeVariablesFile writes content to a temporary file that is to be read by
+// the hcl parser. If content is empty nothing is written and nil is returned.
+// The return value is otherwise a one element slice with the filename of the
+// temporary file. Also returned is a cleanup function that must be called by
+// the caller for removing the temporary file once it is no longer needed.
+func writeVariablesFile(content string) ([]string, func(), error) {


I would love to avoid disk writes. We've been very careful to restrict those to Raft on Servers, and AFAICT these files would be written before any size limitations were applied. It shouldn't matter, but at the very least its existence introduces a new consideration when diagnosing performance.

If Variables became a map[string]string of filenames -> contents, then we could add a new parser entrypoint func in jobspec2/parse.go to skip the actual filesystem bits. AFAICT the hcl library itself only wants a filename and does not perform any IO: only our wrapper funcs in jobspec2 perform IO. (I'm going to be sad if I'm wrong)

Oh right and this is for the Parse endpoint which only requires read-job, so it seems bad form to allow it to perform diskio.

For some reason I thought it was the hcl library requiring file names, but yeah it's just our own jobspec2 package. I expanded the ParseOptions to also enable setting raw variable file content, and now just use that eliminating the need for files. eaddac3

schmichael · 2023-04-06T23:33:05Z

command/agent/job_endpoint.go

+	}
+
+	if totalSize > maxSize {
+		return nil


It would be nice to give users some indication of what happened to their submission (if maxSize is > 0).

In 1699883 I move this check out of the HTTP layer and into the RPC layer, where we do lots of job admission controller things. That way we get to set a warning here, and avoid reading server config on clients which as you point out below doesn't exist.

schmichael · 2023-04-06T23:33:51Z

command/agent/job_endpoint.go

@@ -410,8 +450,13 @@ func (s *HTTPServer) jobUpdate(resp http.ResponseWriter, req *http.Request, jobI
 	}

 	sJob, writeReq := s.apiJobAndRequestToStructs(args.Job, req, args.WriteRequest)
+	maxSubmissionSize := s.agent.Server().GetConfig().JobMaxSourceSize


I think this will panic on Clients.

command/agent/job_endpoint.go

schmichael · 2023-04-06T23:46:13Z

nomad/state/state_store.go

@@ -1631,23 +1631,23 @@ func (s *StateStore) Nodes(ws memdb.WatchSet) (memdb.ResultIterator, error) {
 }

 // UpsertJob is used to register a job or update a job definition
-func (s *StateStore) UpsertJob(msgType structs.MessageType, index uint64, job *structs.Job) error {
+func (s *StateStore) UpsertJob(msgType structs.MessageType, index uint64, sub *structs.JobSubmission, job *structs.Job) error {


Could have saved yourself a lot of code churn by just adding a new upsertJobImpl wrapper that took this, but nbd. 😅

schmichael · 2023-04-06T23:52:11Z

nomad/state/state_store.go

+// job structure originates from. It is up to the job submitter to include the source
+// material, and as such sub may be nil, in which case nothing is stored.
+func (s *StateStore) updateJobSubmission(index uint64, sub *structs.JobSubmission, namespace, jobID string, version uint64, txn *txn) error {
+	if sub == nil || namespace == "" || jobID == "" {


Shouldn't namespace or jobID being empty be an error? Readers might assume ignoring all 3 cases implies some behavior that I don't think it's meant to imply.

In f19721c we now return an error if namespace or jobID are not set

schmichael · 2023-04-06T23:55:59Z

nomad/structs/structs.go

+	// JobIndex is managed internally, not set.
+	//
+	// The raft index the Job this submission is associated with.
+	JobIndex uint64
+}


The JobModifyIndex specifically right? I don't think it matters, but since Jobs have a whopping 3 indexes on them it might be nice to specify if its well known.

Renamed to clarify in 37e71b2

schmichael · 2023-04-07T00:03:30Z

website/content/api-docs/jobs.mdx

@@ -122,6 +122,10 @@ The table below shows this endpoint's support for

 - `Job` `(Job: <required>)` - Specifies the JSON definition of the job.

+- `Submission` `(JobSubmission: <optional>)` - Specifies the original HCL/HCL2/JSON
+  definition of the job. This data is useful for reference only, it is not considered
+  for the actual registration of `Job`.


Suggested change

for the actual registration of `Job`.

for the actual scheduling of `Job`.

A bit pedantic so nbd either way. I think arguably it is considered by the Job.Register operation, just not for any operational logic. Just saying "scheduling" might make the distinction more clear even if it's ignored by more than just scheduling logic.

schmichael · 2023-04-07T00:06:32Z

command/job_run.go

@@ -317,6 +317,7 @@ func (c *JobRunCommand) Run(args []string) int {
 		PolicyOverride: override,
 		PreserveCounts: preserveCounts,
 		EvalPriority:   evalPriority,
+		Submission:     sub,


Could we add a -drop-source flag (or similar) to allow people to opt out of submitting the source? I'm thinking it might be useful for batch/sysbatch jobs where you might be submitting a relatively large number in an automated way and don't want the overhead of the source.

From offline discussion it's become apparent we need to add something to jobspecs to say "Don't save me!" so that operators don't have to change how they submit jobs depending on whether or not they want the source persisted.

tgross · 2023-04-07T12:54:31Z

nomad/job_endpoint.go

+		queryMeta: &reply.QueryMeta,
+		run: func(ws memdb.WatchSet, state *state.StateStore) error {
+			// Look for the submission
+			out, err := state.JobSubmission(ws, args.RequestNamespace(), args.JobID, args.Version)


We're submitting the jobspec with the parent job and there's no pointer from structs.Job to the structs.JobSubmission (which is totally reasonable). But that means for periodic jobs when we call deriveJob (and similarly for dispatch jobs) we're copying the structs.Job and changing the job ID, which leaves no way to get from the derived job to the job submission anymore.

Maybe we should detect the case where there's a parent job ID and pass that as the job ID parameter here instead?

I like keeping this simple and leaving that up to higher layers (UI and API). I think job invocations (aka child dispatch or periodic jobs) should be as minimal as possible, and I think reflecting that in the lowlevel RPCs/APIs is important to let users know they're minimal.

I would hate to include the parent source and give a user the impression we're "wasting" space saving the same job submission for every invocation of a job. That could lead them to disable the feature erroneously thinking they'll get a big performance improvement.

That's reasonable, but we'll need to let the UI folks know that and we should document that you're responsible for sending the parent ID in the API docs.

…ify)

shoenig · 2023-04-10T14:09:34Z

In ca315b5 the state store is updated to prune all but the last structs.JobTrackedVersions (6) job submissions, ensuring we do not balloon the state store on stored job submissions.

shoenig · 2023-04-10T15:13:19Z

Should be good for another 👀

tgross · 2023-04-10T15:34:56Z

website/content/api-docs/jobs.mdx

+  the job in the var file format.
+
+- `VariableFlags` `(map[string]string: nil)` - Specifies HCL2 variables to use
+  during parsing of the job in key = value format.q


Suggested change

during parsing of the job in key = value format.q

during parsing of the job in key = value format.

website/content/api-docs/jobs.mdx

command/agent/config.go

command/helpers.go

nomad/state/state_store.go

schmichael · 2023-04-10T22:55:57Z

api/jobs.go

@@ -1248,7 +1310,8 @@ type JobRevertRequest struct {

 // JobRegisterRequest is used to update a job
 type JobRegisterRequest struct {
-	Job *Job
+	Submission *JobSubmission
+	Job        *Job


Suggested change

Job *Job

Job *Job

command/agent/job_endpoint.go

vercel bot deployed to Preview – nomad-storybook-and-ui April 3, 2023 19:11 View deployment

vercel bot deployed to Preview – nomad April 3, 2023 19:12 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 4, 2023 13:56 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 4, 2023 15:20 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 4, 2023 16:27 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 4, 2023 20:21 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 4, 2023 20:36 View deployment

shoenig force-pushed the hcl-in-job branch from 14ba087 to 3510f9c Compare April 4, 2023 20:48

vercel bot deployed to Preview – nomad April 4, 2023 20:52 View deployment

shoenig force-pushed the hcl-in-job branch from 3510f9c to 6a35669 Compare April 4, 2023 20:53

shoenig changed the title ~~wip: api support for hcl in ui~~ api: enable support for setting original source alongside job Apr 4, 2023

vercel bot deployed to Preview – nomad April 4, 2023 20:56 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 4, 2023 20:57 View deployment

shoenig changed the title ~~api: enable support for setting original source alongside job~~ api: enable support for setting original job source Apr 5, 2023

vercel bot deployed to Preview – nomad-storybook-and-ui April 5, 2023 14:22 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 5, 2023 15:09 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 5, 2023 15:39 View deployment

vercel bot deployed to Preview – nomad April 5, 2023 18:42 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 5, 2023 18:43 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 5, 2023 19:07 View deployment

shoenig force-pushed the hcl-in-job branch from a5fc98a to eaddac3 Compare April 5, 2023 19:44

vercel bot deployed to Preview – nomad April 5, 2023 19:47 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 5, 2023 19:48 View deployment

shoenig added this to the 1.6.0 milestone Apr 5, 2023

shoenig marked this pull request as ready for review April 5, 2023 20:27

shoenig requested review from schmichael and lgfa29 April 5, 2023 20:27

schmichael reviewed Apr 7, 2023

View reviewed changes

tgross reviewed Apr 7, 2023

View reviewed changes

shoenig added 2 commits April 7, 2023 10:45

api: be exact about the job index we associate a submission with (mod…

37e71b2

…ify)

api: reword api docs scheduling

097e7c2

vercel bot deployed to Preview – nomad April 7, 2023 15:53 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 7, 2023 15:53 View deployment

api: prune all but the last 6 job submissions

ca315b5

vercel bot deployed to Preview – nomad-storybook-and-ui April 10, 2023 14:11 View deployment

api: protect against nil job submission in job validation

dc6ebe6

vercel bot deployed to Preview – nomad-storybook-and-ui April 10, 2023 14:23 View deployment

api: set max job source size in test server

ee22376

vercel bot deployed to Preview – nomad-storybook-and-ui April 10, 2023 14:54 View deployment

shoenig requested review from schmichael and tgross April 10, 2023 15:12

tgross reviewed Apr 10, 2023

View reviewed changes

website/content/api-docs/jobs.mdx Outdated Show resolved Hide resolved

command/agent/config.go Outdated Show resolved Hide resolved

tgross reviewed Apr 10, 2023

View reviewed changes

command/helpers.go Show resolved Hide resolved

tgross reviewed Apr 10, 2023

View reviewed changes

nomad/state/state_store.go Show resolved Hide resolved

schmichael approved these changes Apr 10, 2023

View reviewed changes

api: fixups from pr

c9ef5c9

vercel bot deployed to Preview – nomad April 11, 2023 13:22 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui April 11, 2023 13:23 View deployment

tgross approved these changes Apr 11, 2023

View reviewed changes

shoenig merged commit 2c44cbb into main Apr 11, 2023

shoenig deleted the hcl-in-job branch April 11, 2023 13:45

shoenig mentioned this pull request Apr 11, 2023

e2e: add e2e tests for job submission api #16841

Merged

shoenig mentioned this pull request May 9, 2023

chore: backport UpsertJob signature change #17127

Closed

philrenaud mentioned this pull request Jul 6, 2023

When running a job via pack, submit the job HCL as well hashicorp/nomad-pack#371

Closed

pkazmierczak mentioned this pull request Jul 10, 2023

runner: submit template when running the job hashicorp/nomad-pack#375

Merged

1 task

tristanmorgan mentioned this pull request Jul 31, 2023

Terraform fails to notice when a Nomad job has changed hashicorp/terraform-provider-nomad#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api: enable support for setting original job source #16763

api: enable support for setting original job source #16763

shoenig commented Apr 3, 2023 •

edited

Loading

schmichael left a comment

schmichael Apr 6, 2023

shoenig Apr 7, 2023

schmichael Apr 6, 2023

schmichael Apr 6, 2023

shoenig Apr 7, 2023

schmichael Apr 6, 2023

shoenig Apr 7, 2023

schmichael Apr 6, 2023

schmichael Apr 6, 2023

schmichael Apr 6, 2023

shoenig Apr 7, 2023

schmichael Apr 6, 2023

shoenig Apr 7, 2023

schmichael Apr 7, 2023

schmichael Apr 7, 2023

schmichael Apr 7, 2023

tgross Apr 7, 2023

schmichael Apr 7, 2023

tgross Apr 10, 2023

shoenig commented Apr 10, 2023

shoenig commented Apr 10, 2023

tgross Apr 10, 2023

schmichael Apr 10, 2023

	for the actual registration of `Job`.
	for the actual scheduling of `Job`.

	during parsing of the job in key = value format.q
	during parsing of the job in key = value format.

api: enable support for setting original job source #16763

api: enable support for setting original job source #16763

Conversation

shoenig commented Apr 3, 2023 • edited Loading

schmichael left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoenig commented Apr 10, 2023

shoenig commented Apr 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoenig commented Apr 3, 2023 •

edited

Loading