Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasource eval breakdown #12615

Closed
wants to merge 28 commits into from
Closed

Conversation

lbajolet-hashicorp
Copy link
Contributor

This PR aims to rework the logic around datasource evaluation, so we can execute each datasource separately, and not necessarily through the evaluateDatasources function.

This is made as a building block for the introduction later of the DAG.

Essentially, this PR adds some dependency list to the datasources. This list is for now limited to other datasources, but will likely change when we move to a more DAG-y workflow, where locals for example can become dependencies of the datasource.

With the change in execution behaviour, we also rework the evaluateDatasources (renamed to executeDatasources for consistency), and adopt a flow similar to the local variables evaluation.

⚠️ : bit messy since based on both the code from #12607 and #12608. Only the last two commits are relevant for this PR.

Not sure why this was defined and returned, but the value was set, but
never used, as such this is not useful to keep in the code, so let's
simplify this now.
Since datasources are recursively evaluated depending on their
dependencies, we don't need to pre-execute those that depend on nothing,
as the recursive traversal of the datasources will take care of that for
us.
When a datasource fails to be evaluated because a cycle has been
detected, we point out one of the links of the chain now so that users
have a better idea of what to look at.
Datasources use their attribute's expressions to determine whether or
not they depend on another datasource, in order to get the list of
dependencies and execute them before executing a datasource.

This code may be useful later on for figuring out the dependencies for
any block, so we move this code to the utils.go file, and use this for
datasources.
The previous implementation of the GetVarsByType function worked only on
top-level attributes, ignoring the nested blocks in the structure.

This implies that if a datasource depends on another through an
expression within a nested block, we may not execute it first, and then
executing this datasource before its dependent is possible, resulting in
an error in the end.

This commit is an attempt at making this more reliable for HCL configs,
but only works on configs lifted from HCL files for now. We need to make
this more reliable for later iterations.
In previous versions of Packer, the parser would load a HCL file (in
either HCL or JSON format), and process it in phases.
This process meant that if the file was not valid in the first place,
we'd have some duplicate errors first.

Another thing is that sources and build blocks were not being parsed at
the beginning of the parsing phase at all, but were discovered later in
the process, when the data sources have been executed and variables have
been evaluated.

This commit changes the parsing to happen all at once, so we parse the
files with the parser, which will fail should one file be malformatted,
then we visit the body once using the top-level schema for a Packer
build template, handling misplaced blocks once in the process.

Then, since the builds and sources may contain some dynamic blocks that
need to be expanded once their context is ready (i.e. the
variables/datasources they depend on), we can expand the build block for
them, and populate their respective objects.

This last step should also be adapted for datasources later, as they may
need to support dynamic blocks, and be expanded lazily.
The `Datasources' type holds a map of datasource references to their
respective block.
In this map, the blocks were structures rather than pointer to
structures, which makes updating their contents a harder process than it
should be.

To allow for in-place changes for the datasource blocks, we change the
type of the blocks to be a pointer instead.
Since we want to allow for independently executing datasources when
needed, we break down the logic to do so.

Each datasource will now hold a list of dependencies. For now, these
dependencies are all datasources themselves, but this may change in the
future.

Then, actual execution of the datasource is self-contained within a
method of the structure, so that we can individually execute them.

Finally, we replace the current sequential execution by a method of the
PackerConfig with an approach akin to local variable evaluation, where
we execute all the datasources one-by-one, and if they cannot be
executed, move along.
We retry this loop for as many times as we don't reach either a state
where all the datasources have been executed; or when we reach a
terminal state where datasources cannot be evaluated because of
dependencies remaining, and no changes have occurred after a try.
@lbajolet-hashicorp lbajolet-hashicorp requested a review from a team as a code owner September 1, 2023 16:12
In some cases, rather than walking through a full block of HCL in order
to lift the references to other components of a build, we only have the
expression to walk on.
From this expression, we can lift the variables as a collection of
hcl.Traversal, from which we then filter only those that interest us.

So, since this is a use-case, we split the function that lifts the
expressions from a hcl block and filter them based on the root type of
the traversal into two separate functions.
As with datasources, we break down the local variable evaluation in two
functions, one doing the evaluation for a local variable, and the
sequential/recursive evaluation of all the local variables lifted from
the packer template.
This, plus dependency management will let us later reuse it for another
evaluation strategy, while reusing those components as much as
possible.
Local variables can't have a validation block in their definition, so
this step in not useful and should be removed.

Besides, since the validation was done on the local variables before
evaluation, it did nothing at all, as the PackerConfig.LocalVariables
collection gets populated during evaluation, so this is essentially a
no-op, and can be safely removed.
@lbajolet-hashicorp lbajolet-hashicorp marked this pull request as draft September 6, 2023 20:06
@lbajolet-hashicorp
Copy link
Contributor Author

Moving to draft; this will be where I push the scheduler code as I work on introducing it to the main branch.

The other PRs that it depends on can be reviewed and merged since they fix some issues with the current versions of Packer, but this one is essentially setting the stage for the DAG, as such this may be moved to another branch than main, but should be maintained up-to-date as much as possible.

Since hcl.Diagnostics supports extending it through the `Extend' method,
we might as well use it instead of manually appending a series of
diagnostics to it.
Since these constants and functions will be later useful for the
extraction of the sequential logic in a scheduler, we change the
visibility of those now.
When processing builds, we convert build blocks/sources to a series of
CoreBuild, which contains the sequence of provisioners and
post-processors to run, and the builder to use to boot the instance and
create the image from.

This process of converting builds used to be completely sequential, and
orchestrated by the `GetBuilds' function.

This gets broken down into a `ToCoreBuild' function on the build block,
whose function is to lift and filter-out (based on --only/--except) the
core builds from a valid build block, and this gets invoked from
`GetBuilds'.
This breakdown is useful as a preparation to the DAG's introduction to
the code later on.
When a user specifies a --only/--except option in their command-line,
and it turns-out to be unused, Packer signals it with a Warning
diagnostic.

This diagnostic, in previous versions of Packer, would only state that
one option did not match anything, without stating which option did not
match a build or post-processor.

This commit tracks the usage of each option so we can report later which
of those options was not used in this warning diagnostic.
Since those arguments will need to be updated from outside the
hcl2template package, we change their visibility from private to public.
Since the initialization code was embedded in the sequential logic that
we'll be moving away from soon, we move that to the build block itself,
so we're able to invoke it in any order later on.
Since these will be used by the schedulers' code, we need to make it
available from outside the `packer' package, so we make them public.
Since this will be required to be called from elsewhere later, we make
this function public, and since it relies on data from config to be
passed in at callsite, we may as well move this function to be a
PackerConfig method instead.
Since printing the diagnostics to users requires using a map of
filename to HCL files, we change this in the config definition for HCL
templates.
Since we'll move the initialization code to the schedulers once they're
in the codebase, we make the validation function public.
In order to be able to change how templates are processed later on, we
move the logic that processes templates to a new abstraction: Scheduler.

Right now, one implementation of Scheduler exists: sequential. This is
essentially the same approach as what we have now embedded in the
configs, but is now moved to this structure, so we can change it later.
As the last command that relies on the deprecated interfaces for
interacting with a template, we move hcl2_upgrade to the schedulers, at
least for the part that prepares the template for translation.
Since the Initialise/GetBuilds functions are not used anymore by
Packer's commands directly, as they are superseded by the scheduler's
implementations, we can remove them from the Handler interface.
@lbajolet-hashicorp
Copy link
Contributor Author

This won't go into main any time soon, and we will probably building on this in a separate branch for the scheduler testing and the next evolutions, so I'll close this, and we can keep this branch up (probably rename it so it is clearer what its purpose will be).

@lbajolet-hashicorp lbajolet-hashicorp deleted the datasource_eval_breakdown branch September 19, 2023 20:14
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant