Datasource eval breakdown #12615

lbajolet-hashicorp · 2023-09-01T16:12:07Z

This PR aims to rework the logic around datasource evaluation, so we can execute each datasource separately, and not necessarily through the evaluateDatasources function.

This is made as a building block for the introduction later of the DAG.

Essentially, this PR adds some dependency list to the datasources. This list is for now limited to other datasources, but will likely change when we move to a more DAG-y workflow, where locals for example can become dependencies of the datasource.

With the change in execution behaviour, we also rework the evaluateDatasources (renamed to executeDatasources for consistency), and adopt a flow similar to the local variables evaluation.

⚠️ : bit messy since based on both the code from #12607 and #12608. Only the last two commits are relevant for this PR.

Not sure why this was defined and returned, but the value was set, but never used, as such this is not useful to keep in the code, so let's simplify this now.

Since datasources are recursively evaluated depending on their dependencies, we don't need to pre-execute those that depend on nothing, as the recursive traversal of the datasources will take care of that for us.

When a datasource fails to be evaluated because a cycle has been detected, we point out one of the links of the chain now so that users have a better idea of what to look at.

Datasources use their attribute's expressions to determine whether or not they depend on another datasource, in order to get the list of dependencies and execute them before executing a datasource. This code may be useful later on for figuring out the dependencies for any block, so we move this code to the utils.go file, and use this for datasources.

The previous implementation of the GetVarsByType function worked only on top-level attributes, ignoring the nested blocks in the structure. This implies that if a datasource depends on another through an expression within a nested block, we may not execute it first, and then executing this datasource before its dependent is possible, resulting in an error in the end. This commit is an attempt at making this more reliable for HCL configs, but only works on configs lifted from HCL files for now. We need to make this more reliable for later iterations.

In previous versions of Packer, the parser would load a HCL file (in either HCL or JSON format), and process it in phases. This process meant that if the file was not valid in the first place, we'd have some duplicate errors first. Another thing is that sources and build blocks were not being parsed at the beginning of the parsing phase at all, but were discovered later in the process, when the data sources have been executed and variables have been evaluated. This commit changes the parsing to happen all at once, so we parse the files with the parser, which will fail should one file be malformatted, then we visit the body once using the top-level schema for a Packer build template, handling misplaced blocks once in the process. Then, since the builds and sources may contain some dynamic blocks that need to be expanded once their context is ready (i.e. the variables/datasources they depend on), we can expand the build block for them, and populate their respective objects. This last step should also be adapted for datasources later, as they may need to support dynamic blocks, and be expanded lazily.

The `Datasources' type holds a map of datasource references to their respective block. In this map, the blocks were structures rather than pointer to structures, which makes updating their contents a harder process than it should be. To allow for in-place changes for the datasource blocks, we change the type of the blocks to be a pointer instead.

Since we want to allow for independently executing datasources when needed, we break down the logic to do so. Each datasource will now hold a list of dependencies. For now, these dependencies are all datasources themselves, but this may change in the future. Then, actual execution of the datasource is self-contained within a method of the structure, so that we can individually execute them. Finally, we replace the current sequential execution by a method of the PackerConfig with an approach akin to local variable evaluation, where we execute all the datasources one-by-one, and if they cannot be executed, move along. We retry this loop for as many times as we don't reach either a state where all the datasources have been executed; or when we reach a terminal state where datasources cannot be evaluated because of dependencies remaining, and no changes have occurred after a try.

In some cases, rather than walking through a full block of HCL in order to lift the references to other components of a build, we only have the expression to walk on. From this expression, we can lift the variables as a collection of hcl.Traversal, from which we then filter only those that interest us. So, since this is a use-case, we split the function that lifts the expressions from a hcl block and filter them based on the root type of the traversal into two separate functions.

As with datasources, we break down the local variable evaluation in two functions, one doing the evaluation for a local variable, and the sequential/recursive evaluation of all the local variables lifted from the packer template. This, plus dependency management will let us later reuse it for another evaluation strategy, while reusing those components as much as possible.

Local variables can't have a validation block in their definition, so this step in not useful and should be removed. Besides, since the validation was done on the local variables before evaluation, it did nothing at all, as the PackerConfig.LocalVariables collection gets populated during evaluation, so this is essentially a no-op, and can be safely removed.

lbajolet-hashicorp · 2023-09-06T20:07:41Z

Moving to draft; this will be where I push the scheduler code as I work on introducing it to the main branch.

The other PRs that it depends on can be reviewed and merged since they fix some issues with the current versions of Packer, but this one is essentially setting the stage for the DAG, as such this may be moved to another branch than main, but should be maintained up-to-date as much as possible.

Since hcl.Diagnostics supports extending it through the `Extend' method, we might as well use it instead of manually appending a series of diagnostics to it.

Since these constants and functions will be later useful for the extraction of the sequential logic in a scheduler, we change the visibility of those now.

When processing builds, we convert build blocks/sources to a series of CoreBuild, which contains the sequence of provisioners and post-processors to run, and the builder to use to boot the instance and create the image from. This process of converting builds used to be completely sequential, and orchestrated by the `GetBuilds' function. This gets broken down into a `ToCoreBuild' function on the build block, whose function is to lift and filter-out (based on --only/--except) the core builds from a valid build block, and this gets invoked from `GetBuilds'. This breakdown is useful as a preparation to the DAG's introduction to the code later on.

When a user specifies a --only/--except option in their command-line, and it turns-out to be unused, Packer signals it with a Warning diagnostic. This diagnostic, in previous versions of Packer, would only state that one option did not match anything, without stating which option did not match a build or post-processor. This commit tracks the usage of each option so we can report later which of those options was not used in this warning diagnostic.

Since those arguments will need to be updated from outside the hcl2template package, we change their visibility from private to public.

Since the initialization code was embedded in the sequential logic that we'll be moving away from soon, we move that to the build block itself, so we're able to invoke it in any order later on.

Since these will be used by the schedulers' code, we need to make it available from outside the `packer' package, so we make them public.

Since this will be required to be called from elsewhere later, we make this function public, and since it relies on data from config to be passed in at callsite, we may as well move this function to be a PackerConfig method instead.

Since printing the diagnostics to users requires using a map of filename to HCL files, we change this in the config definition for HCL templates.

Since we'll move the initialization code to the schedulers once they're in the codebase, we make the validation function public.

In order to be able to change how templates are processed later on, we move the logic that processes templates to a new abstraction: Scheduler. Right now, one implementation of Scheduler exists: sequential. This is essentially the same approach as what we have now embedded in the configs, but is now moved to this structure, so we can change it later.

As the last command that relies on the deprecated interfaces for interacting with a template, we move hcl2_upgrade to the schedulers, at least for the part that prepares the template for translation.

Since the Initialise/GetBuilds functions are not used anymore by Packer's commands directly, as they are superseded by the scheduler's implementations, we can remove them from the Handler interface.

lbajolet-hashicorp · 2023-09-19T20:14:05Z

This won't go into main any time soon, and we will probably building on this in a separate branch for the scheduler testing and the next evolutions, so I'll close this, and we can keep this branch up (probably rename it so it is clearer what its purpose will be).

github-actions · 2023-10-20T02:00:24Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

lbajolet-hashicorp added 9 commits August 24, 2023 16:14

hcl2template: remove unused shouldContinue bool

9dfb7a5

Not sure why this was defined and returned, but the value was set, but never used, as such this is not useful to keep in the code, so let's simplify this now.

hcl2template: simplify datasource evaluation

8f22241

Since datasources are recursively evaluated depending on their dependencies, we don't need to pre-execute those that depend on nothing, as the recursive traversal of the datasources will take care of that for us.

hcl2template: report localtion for cycle detection

47d2aff

When a datasource fails to be evaluated because a cycle has been detected, we point out one of the links of the chain now so that users have a better idea of what to look at.

hcl2template: simplify startDatasource function

87dbdb1

lbajolet-hashicorp requested a review from a team as a code owner September 1, 2023 16:12

lbajolet-hashicorp added 3 commits September 6, 2023 10:16

lbajolet-hashicorp marked this pull request as draft September 6, 2023 20:06

hcl2template: use diags.Extend instead of append

58b078f

Since hcl.Diagnostics supports extending it through the `Extend' method, we might as well use it instead of manually appending a series of diagnostics to it.

lbajolet-hashicorp force-pushed the datasource_eval_breakdown branch from c91b2c7 to 57cf6b8 Compare September 11, 2023 15:09

lbajolet-hashicorp added 8 commits September 19, 2023 10:35

hcl2template: make consts and functions public

996c4a8

Since these constants and functions will be later useful for the extraction of the sequential logic in a scheduler, we change the visibility of those now.

hcl2template: make config's cmd arguments public

d4ce57d

Since those arguments will need to be updated from outside the hcl2template package, we change their visibility from private to public.

hcl2template: move initialization logic to builds

01651b1

Since the initialization code was embedded in the sequential logic that we'll be moving away from soon, we move that to the build block itself, so we're able to invoke it in any order later on.

packer: change visibility of builds and variables

05199ec

Since these will be used by the schedulers' code, we need to make it available from outside the `packer' package, so we make them public.

hcl2template: move duplicate check to PackerConfig

0de1e18

Since this will be required to be called from elsewhere later, we make this function public, and since it relies on data from config to be passed in at callsite, we may as well move this function to be a PackerConfig method instead.

hcl2template: keep hcl2 files in a map

4be4c6e

Since printing the diagnostics to users requires using a map of filename to HCL files, we change this in the config definition for HCL templates.

lbajolet-hashicorp force-pushed the datasource_eval_breakdown branch from 1819bb2 to 72bd2f5 Compare September 19, 2023 14:37

lbajolet-hashicorp added 4 commits September 19, 2023 11:16

packer: make Core.Validate public

0065f73

Since we'll move the initialization code to the schedulers once they're in the codebase, we make the validation function public.

command: migrate validate to schedulers

61a3d6a

command: move inspect to schedulers

3556c55

command: move console to schedulers

3be2dea

lbajolet-hashicorp force-pushed the datasource_eval_breakdown branch from 72bd2f5 to e8eba9e Compare September 19, 2023 18:52

lbajolet-hashicorp added 2 commits September 19, 2023 16:10

command: migrate hcl2_upgrade to schedulers

9e0c354

As the last command that relies on the deprecated interfaces for interacting with a template, we move hcl2_upgrade to the schedulers, at least for the part that prepares the template for translation.

packer: remove Initialise/GetBuilds from handler

31c4925

Since the Initialise/GetBuilds functions are not used anymore by Packer's commands directly, as they are superseded by the scheduler's implementations, we can remove them from the Handler interface.

lbajolet-hashicorp force-pushed the datasource_eval_breakdown branch from e8eba9e to 31c4925 Compare September 19, 2023 20:11

lbajolet-hashicorp closed this Sep 19, 2023

lbajolet-hashicorp deleted the datasource_eval_breakdown branch September 19, 2023 20:14

github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasource eval breakdown #12615

Datasource eval breakdown #12615

lbajolet-hashicorp commented Sep 1, 2023

lbajolet-hashicorp commented Sep 6, 2023

lbajolet-hashicorp commented Sep 19, 2023

github-actions bot commented Oct 20, 2023

Datasource eval breakdown #12615

Datasource eval breakdown #12615

Conversation

lbajolet-hashicorp commented Sep 1, 2023

lbajolet-hashicorp commented Sep 6, 2023

lbajolet-hashicorp commented Sep 19, 2023

github-actions bot commented Oct 20, 2023