Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hcl2template: rework parsing logic #12607

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

lbajolet-hashicorp
Copy link
Contributor

In previous versions of Packer, the parser would load a HCL file (in either HCL or JSON format), and process it in phases. This process meant that if the file was not valid in the first place, we'd have some duplicate errors first.

Another thing is that sources and build blocks were not being parsed at the beginning of the parsing phase at all, but were discovered later in the process, when the data sources have been executed and variables have been evaluated.

This commit changes the parsing to happen all at once, so we parse the files with the parser, which will fail should one file be malformatted, then we visit the body once using the top-level schema for a Packer build template, handling misplaced blocks once in the process.

Then, since the builds and sources may contain some dynamic blocks that need to be expanded once their context is ready (i.e. the variables/datasources they depend on), we can expand the build block for them, and populate their respective objects.

This last step should also be adapted for datasources later, as they may need to support dynamic blocks, and be expanded lazily.

Copy link
Member

@nywilken nywilken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. I left a few suggestions to start.

hcl2template/types.build.go Outdated Show resolved Hide resolved
hcl2template/parser.go Show resolved Hide resolved
hcl2template/parser.go Outdated Show resolved Hide resolved
hcl2template/types.build.go Outdated Show resolved Hide resolved
hcl2template/types.build.go Outdated Show resolved Hide resolved
hcl2template/types.build.go Outdated Show resolved Hide resolved
hcl2template/types.build.go Outdated Show resolved Hide resolved
hcl2template/types.build.go Outdated Show resolved Hide resolved
return nil
}

build.ready = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be set to true after all has been decoded? I see there are a few early returns if there are errors. In that case does this attribute represent that the build block has been fully decoded or that we've called finalizeDocode on it already?

At first I took it to mean that the build block is good to go and has been fully decoded. But I wonder how we capture the case where build.decoded is true but it decoded with error. If Packer fails then this case is probably moot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the build block fails to be decoded, it means some of its dependencies wasn't properly executed before, so the build should fail at that point. Here, we can consider it decoded after running it once, as it's not supposed to be executed multiple times.
If the decoding fails, so will Packer.

Name: "ubuntu-1204",
},
},
Builds: Builds{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not entirely clear to me why we need to initialize Builds and Sources with the rework. Is it because the calls to finalizeDecode requires valid blocks?

Copy link
Contributor Author

@lbajolet-hashicorp lbajolet-hashicorp Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since the decode functions are part of the source or build blocks, we need them initialised with a reference to the block they're coming from.
This changes the output for the parsing tests, but fundamentally this doesn't change how Packer behaves from a user standpoint.
Though maybe if there's a structural issue with the source or build block, it will get caught as early as the parsing, and not after the datasources are executed.

Edit: tested this, and no, even with this change, if a build block is fundamentally invalid, we still execute them first. This might be fixable though, I'll see what I can do for this.

hcl2template/parser.go Outdated Show resolved Hide resolved
// Since the build's contents may not have been dynamically
// expanded when we first loaded the config from the template
// file, we decode it now.
diags = append(diags, build.finalizeDecode(cfg)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there is some important context missing here. In the decodeFile you have a comment for the BuildBlock and SourceBlock cases that reads "The final decoding step (which requires an up-to-date context) will be done when we need it." but it is not clear form this line that the time is now.

Also correct me if I'm wrong but a call to finalilzeDecode requires that the buildBlock has been decoded at least once. So calling build.finalizeDecode on an zero value BuildBlock would do what?

It might be my understanding but there feels like there is a dependency chain that is not captured in the comments or in the type itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the build block is not completely decoded at this point, we only partially decode its top-level contents, and we certainly didn't try to expand the potential dynamic blocks inside it.
Same for the source blocks.

Regarding this comment, it's something I did not change, and that was how things worked before too. This initializeBlocks function basically expands the dynamic blocks, and filling-in the blanks in the structures that Packer uses.
At that point we do the decoding for all those, like provisioners, post-processors, and the sources themselves.

That being said, the may indicates that we could try before, but in the current implementation, this is very much not the case, we only get basic information from the blocks, and we need to call the finalizeDecode function on all the build and source blocks.

I'll change the phrasing here to remove this confusion, but does my explanation clarifies things for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, I feel that the confusion stems from the vocabulary not being clear, so I'll leave this:

  • decode (parser): lift the configuration from the HCL file and start building the components for the Packer config. At this point, we have not evaluated anything, nor should we. This step's purpose is only to parse the HCL, check if it is valid, and that there are no unexpected blocks/attributes in the config (for those we do know in advance what the format is).
  • evaluateVariables/executeDatasources: there, we decode (as gohcl.Decode) the configuration for the entity, and send it to the component responsible, which effectively executes it. For variables there's no execution here, only expressions to evaluate, hence the difference in terminology.
  • finalizeDecode (later): there, we expand the dynamic blocks where applicable, and we do as for datasources, we decode (as in gohcl.Decode), and send those configs to the plugins for validation.
  • build: finally, once everything passed the validation and we have *packer.CoreBuild objects, we can then execute the builds.

With this in mind, I think the decode name is not representative of what is done during parsing. What would you think of some alternatives like lift or load? This would make it less likely to have a conflict in naming, and the finalizeDecode functions can then becode decode only, as it's what they essentially do.

I struggle in finding an appropriate name for this step of preparation of the structures, so feel free to suggest anything else, I'm eager to see what you think may fit the purpose better.

In previous versions of Packer, the parser would load a HCL file (in
either HCL or JSON format), and process it in phases.
This process meant that if the file was not valid in the first place,
we'd have some duplicate errors first.

Another thing is that sources and build blocks were not being parsed at
the beginning of the parsing phase at all, but were discovered later in
the process, when the data sources have been executed and variables have
been evaluated.

This commit changes the parsing to happen all at once, so we parse the
files with the parser, which will fail should one file be malformatted,
then we visit the body once using the top-level schema for a Packer
build template, handling misplaced blocks once in the process.

Then, since the builds and sources may contain some dynamic blocks that
need to be expanded once their context is ready (i.e. the
variables/datasources they depend on), we can expand the build block for
them, and populate their respective objects.

This last step should also be adapted for datasources later, as they may
need to support dynamic blocks, and be expanded lazily.
@vercel
Copy link

vercel bot commented Oct 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
packer ⬜️ Ignored (Inspect) Visit Preview Oct 3, 2023 7:14pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants