You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
node /^service-(server|client)(?:-test)?-(\d+)/ {
$myservice_role=$1
$myservice_id=$2
case $myservice_role {
'server': {
# do one set of things
}
'client': {
# do some different things
}
<...etc...>
}
We have observed these nodes randomly swapping server and client behavior, as though their pattern match was for the wrong hostname. This only occurs with threading enabled.
Expected Behavior
A host's sense of its hostname should not be transposed with that of another host. This may in fact be a problem with regexp backreferences generally.
Steps to Reproduce
We're able to quickly reproduce the issue for debugging using ~20 hosts, all matching the node pattern, kicking off agent runs continuously (e.g. while /bin/true;do puppet agent --test;done). We use the following manifest to capture the misbehavior:
node /^thing-(foo|bar)(?:-test)?-(\d+)/ {
$first_blob=$1
$second_blob=$2
if $::hostname =~ /^thing-(foo|bar)(?:-test)?-(\d+)/ {
if $1==$first_blob and $2==$second_blob {
# all is good, just notify
notify { 'regex_test_debug':
message => "hostname=${::hostname} first_blob=${first_blob} second_blob=${second_blob}"
}
} else {
# fail loudly
fail ("regex_test_debug: \$1=$1 \$2=$2 but \$first_blob=$first_blob and \$second_blob=$second_blob")
}
} else {
fail ("regex_test_debug: $::hostname does not match node regex")
}
}
This gives us appropriate fail log entries whenever the node pattern match results do not correspond to the hostname:
2024-04-24T16:17:53.402-04:00 ERROR [qtp943675003-391] [puppetserver] Puppet Evaluation Error: Error while evaluating a Function Call, regex_test_debug: $1=bar $2=8 but $first_blob=foo and $second_blob=5 (file: /export/home/puppet/environments/production/manifests/regex_test.pp, line: 22, column: 7) on node thing-bar-8.example
Environment
Puppet Server 7.x (various, up to and including 7.15.0)
CentOS 7.9.2009, AlmaLinux 8.9
Additional Context
I believe I've identified where these nodes step on one another during concurrent agent runs, however the fix isn't as trivial as adding a sempahore[1].
In Resource::Type (lib/puppet/resource/type.rb) it seems that all nodes which match a regexp node pattern share a common instance of this class. In my test case above we find it running around with the name __node_regexp__thing-foobar:-test-d, and concurrent agent runs report the same Ruby object_id.
This object is slightly stateful, in that the @match attribute seems to contain the most recent string matched against the name pattern. We observe a 1:1 correlation between the state of this variable and what the agent run associated with multiple relevant threads thinks the hostname pattern match object was supposed to contain. I'm not quite sure how it propagates back to the DSL, but it seems likely that this is where it's coming from.
The correct fix isn't immediately obvious, but is probably of one of the following forms:
Guard the value of @match between the time it's set and the time a node is done with it. I am not sure how to definitively tell when it would be safe to unlock.
"Slot" the value of @match based on the source string. This would mean that calling objects would need to be able to identify what pattern they mean by the string that matched the pattern, and I haven't found an obvious attribute to use for the purpose.
Force distinct Resource::Type objects for individual agents
Any approach I've thought of is nontrivial and should have some input from someone more expert in this codebase.
Our situation manifesting this bug is peculiar to the use of backreferences in node blocks in the DSL, but I can see it impacting anything that uses a pattern and results in a Resource::Type object being created.
...oh, and lest I forget, calls to #evaluate_code can run concurrently on the same Resource::Type object. This may (I am not sure and cannot prove it) cause recycled MatchScope objects downstream of both scope.ephemeral_from and code.safeevaluate to get overwritten across threads. Comments imply that those objects do get reused, but I haven't actually seen this to be the case so I'm not sure this is an issue. This aspect would be trivial to solve with a semaphore, if indeed it is a problem.
[1] Technically if I wrap the whole of Parser::Compiler#evaluate_ast_node in an environment.lock.synchronize block it makes the problem go away. However this seems like too high-level a bottleneck. The correct fix is probably to Resource::Type.
The text was updated successfully, but these errors were encountered:
Describe the Bug
We have node manifests of the following form:
We have observed these nodes randomly swapping server and client behavior, as though their pattern match was for the wrong hostname. This only occurs with threading enabled.
Expected Behavior
A host's sense of its hostname should not be transposed with that of another host. This may in fact be a problem with regexp backreferences generally.
Steps to Reproduce
We're able to quickly reproduce the issue for debugging using ~20 hosts, all matching the node pattern, kicking off agent runs continuously (e.g.
while /bin/true;do puppet agent --test;done
). We use the following manifest to capture the misbehavior:This gives us appropriate
fail
log entries whenever the node pattern match results do not correspond to the hostname:Environment
Additional Context
I believe I've identified where these nodes step on one another during concurrent agent runs, however the fix isn't as trivial as adding a sempahore[1].
In
Resource::Type
(lib/puppet/resource/type.rb
) it seems that all nodes which match a regexpnode
pattern share a common instance of this class. In my test case above we find it running around with the name__node_regexp__thing-foobar:-test-d
, and concurrent agent runs report the same Ruby object_id.This object is slightly stateful, in that the
@match
attribute seems to contain the most recent string matched against the name pattern. We observe a 1:1 correlation between the state of this variable and what the agent run associated with multiple relevant threads thinks the hostname pattern match object was supposed to contain. I'm not quite sure how it propagates back to the DSL, but it seems likely that this is where it's coming from.The correct fix isn't immediately obvious, but is probably of one of the following forms:
@match
between the time it's set and the time a node is done with it. I am not sure how to definitively tell when it would be safe to unlock.@match
based on the source string. This would mean that calling objects would need to be able to identify what pattern they mean by the string that matched the pattern, and I haven't found an obvious attribute to use for the purpose.Resource::Type
objects for individual agentsAny approach I've thought of is nontrivial and should have some input from someone more expert in this codebase.
Our situation manifesting this bug is peculiar to the use of backreferences in
node
blocks in the DSL, but I can see it impacting anything that uses a pattern and results in aResource::Type
object being created....oh, and lest I forget, calls to
#evaluate_code
can run concurrently on the sameResource::Type
object. This may (I am not sure and cannot prove it) cause recycledMatchScope
objects downstream of bothscope.ephemeral_from
andcode.safeevaluate
to get overwritten across threads. Comments imply that those objects do get reused, but I haven't actually seen this to be the case so I'm not sure this is an issue. This aspect would be trivial to solve with a semaphore, if indeed it is a problem.[1] Technically if I wrap the whole of
Parser::Compiler#evaluate_ast_node
in anenvironment.lock.synchronize
block it makes the problem go away. However this seems like too high-level a bottleneck. The correct fix is probably toResource::Type
.The text was updated successfully, but these errors were encountered: