Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call ConfigObject#Register() after #OnAllConfigLoaded() for runtime creation #10057

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented May 13, 2024

so that they're invisible until successful complete initialization. Otherwise some members may be still NULL and crash the daemon.

ref/IP/53428

What Icinga does with config

  1. Parse AST
  2. Eval AST, which produces ConfigItems
  3. Commit and register, for each item/object
  4. Call ConfigObject#OnAllConfigLoaded to init remaining members, e.g. Downtime#m_Checkable

Registering an object makes it immediately available for all object getters by type, as well as DSL get_host(), etc.. This is fine for the config load, "otherwise such a stupid config won't function properly." © @yhabteab

But not for runtime objects! ConfigObject#Start() has already been called for all and everything does something. E.g. IcingaDB#UpdateAllConfigObjects() operates on all objects as soon as registered – ready or not:

(gdb) frame 8
#8  0x0000000000f9ecac in icinga::IcingaDB::PrepareObject (object=...,
    attributes=..., checksums=...)
    at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:179
179     /usr/include/boost/smart_ptr/intrusive_ptr.hpp: No such file or directory.
(gdb) p downtime.px->m_Checkable
$6 = {px = 0x0}
(gdb)

closes #10151

@Al2Klimov Al2Klimov added bug Something isn't working area/configuration DSL, parser, compiler, error handling area/api REST API core/crash Shouldn't happen, requires attention ref/IP area/runtime Downtimes, comments, dependencies, events labels May 13, 2024
@cla-bot cla-bot bot added the cla/signed label May 13, 2024
@julianbrost julianbrost requested a review from yhabteab May 13, 2024 15:04
@Al2Klimov
Copy link
Member Author

CC @Wintermute2k6

Comment on lines +559 to +562

if (!registerEarly) {
item->m_Object->Register();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that supposed to fix the problem of objects becoming visible before they are fully loaded? With this PR you're adding yet another, IMHO useless parameter and introducing an inconsistency with the file-based configs, whilst there's a load_after feature intended for exactly such use cases. Why can't you solve it that way? Otherwise, this PR does not address the root cause of this problem, but it just works around it.

The only problem I'm seeing in using the load_after method to solve this problem is the relationship between Endpoint and Zone. The endpoint type is already set to load after Zone, however Zone still looks for endpoints in Zone::OnAllConfigLoaded() that have not yet been loaded and if you move that cached zone registration to Endpoint::OnAllConfigLoaded() there should be no further issues that I know of, and the object can be unconditionally registered here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Alex offline and he convinced me not to just rely on load_after, otherwise such a stupid config won't function properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding that "stupid config":

    if (!host) {
        var host = get_host(host_name)
    }

I expect this to be rendered within a Service definition as it uses the host_name attribute. Service specifies load_after Host;, so why wouldn't this function when relying on load_after?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See OP What Icinga does with config. load_after itself would function – Services will still be committed after Hosts. But during commit get_host() looks up registered objects. So the host has to be registered immediately after commit, not after OnAllConfigLoaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why wouldn't this function when relying on load_after?

If the objects were only registered after their OnAllConfigLoaded() was triggered, such stupid config would not work because they are evaluated while the object is being committed. However, a load_after dependency does not necessarily mean that a service is blocked from being committed before its host is committed and OnAllConfigLoaded is triggered, it just ensures that a service is committed after the host types, but before Host#OnAllConfigLoaded is triggered and a Service#OnAllConfigLoaded is triggered after the host equivalents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, relying on load_after wouldn't actually break get_host() in that example but rather require OnAllConfigLoaded() and Register() to be called incrementally after each type is loaded. And this would basically violate what the name OnAllConfigLoaded() advertises and most likely cause problems with that strange cyclic relation that exists between Zone and Endpoint.

Yes, this PR started with my suggestion/question whether it would be possible to avoid exposing partially initialized objects at all instead of checking for partially initialized objects in Icinga DB specifically to fix this bug. In the meantime, it was brought to light that there exists code that relies on seeing objects before they were fully initialized, so that suggestion won't work out. This PR tries to make it work nonetheless by registering different objects at different times during their initialization phase, but to be honest, I don't think that's better than the initial suggestion to skip objects still being initialized in the Icinga DB feature. Quite the opposite: this PR makes the config loading even more complex without achieving the goal I initially had in mind (which turned out to be impossible as other code relies on what I suggested to prevent).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite the opposite: this PR makes the config loading even more complex without achieving the goal I initially had in mind (which turned out to be impossible as other code relies on what I suggested to prevent).

Exactly! We either refactor the behaviour and let a load_after dependency block until the parent type is committed and its OnAllConfigLoaded() is triggered all at once, which is no easy task without breaking something, or just use a damn nullptr check where it causes problems now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's better than the initial suggestion to skip objects still being initialized in the Icinga DB feature.

For now, only the Icinga DB feature turned out to suffer from a yet incomplete Downtime. Instead of relying on that...

just use a damn nullptr check where it causes problems now.

... and working around that "leak" of uninited objects, I suggest to just plug it via this PR.

I mean, once OnAllConfigLoaded() gets called initially, I (and the code) expect further object lookups to return only finished objects. That's what I fixed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, once OnAllConfigLoaded() gets called initially, I (and the code) expect further object lookups to return only finished objects.

Doesn't Icinga 2 behave exactly as you expect it to behave even without this PR? Isn't the problem that the Icinga DB feature accesses objects too soon that are not yet fully initialised, i.e. before their OnAllConfigLoaded signal is triggered simply by going directly through the ctype objects vector? This PR just introduces an edge case and diverges runtime created and file based objects and makes the codebase even harder to follow what is going on. If you are keen on this, then please just fix this by using the existing load_after feature, which I doubt will make it into the upcoming bug fixe releases. By the way, there is also another PR from last week that addresses this issue #10151.

@Al2Klimov
Copy link
Member Author

Btw. the only runtime child objects origin from apply rules and for them such "late" registration is still soon enough:

apply Service "get_host" {
check_command = "dummy"
log(get_host(host.name))
assign where true
}
curl -sSiku root:123456 -X PUT -H 'Accept: application/json' 'https://127.0.0.1:5665/v1/objects/hosts/h' -d '{"pretty":1,"attrs":{"check_command":"dummy"}}'
[2024-05-16 10:57:20 +0200] information/ApiListener: New client connection from [::ffff:127.0.0.1]:52732 (no client certificate)
[2024-05-16 10:57:20 +0200] information/config: Object of type 'Host'
[2024-05-16 10:57:20 +0200] information/ConfigObjectUtility: Created and activated object 'h' of type 'Host'.
[2024-05-16 10:57:20 +0200] information/HttpServerConnection: Request PUT /v1/objects/hosts/h (from [::ffff:127.0.0.1]:52732, user: root, agent: curl/8.6.0, status: OK) took 12ms.

👍

@julianbrost
Copy link
Contributor

So this change only has an effect for objects created via ConfigObjectUtility::CreateObject(), i.e. runtime-created objects. I'm not sure this would actually fix the problem reported in ref/IP/53428, where you concluded that the null dereference happend while starting the Icinga DB feature, so doesn't that suggest that it happened when starting Icinga 2, not due to a runtime object creation?

@Al2Klimov
Copy link
Member Author

Immediately after Icinga 2 start, the Icinga DB feature dumps all objects, once connected to Redis. This can take long enough for new runtime objects to appear. If now PrepareObject() (see OP) from UpdateAllConfigObjects() happens just before Downtime#OnAllConfigLoaded(), Icinga crashes as in ref/IP/53428.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/configuration DSL, parser, compiler, error handling area/runtime Downtimes, comments, dependencies, events bug Something isn't working cla/signed core/crash Shouldn't happen, requires attention ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants