Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicated cloud config entries after installation #2492

Closed
Tracked by #2535
mudler opened this issue Apr 19, 2024 · 37 comments
Closed
Tracked by #2535

duplicated cloud config entries after installation #2492

mudler opened this issue Apr 19, 2024 · 37 comments
Assignees
Labels
bug Something isn't working prio: high unconfirmed

Comments

@mudler
Copy link
Member

mudler commented Apr 19, 2024

image (16)

Looks like after installation all the list items aren't merged, but appended

@mudler mudler added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Apr 19, 2024
@mudler mudler mentioned this issue Apr 19, 2024
33 tasks
@mauromorales
Copy link
Member

@mudler isn't this happening on Stylus, only?

@ci-robbot ci-robbot added the question Further information is requested label Apr 19, 2024
@nianyush
Copy link

nianyush commented Apr 19, 2024

@mauromorales i think this can happen with kairos also. For example, if you have two identical cloudconfigs with something like this

#cloud-config
stages:
  initramfs:
    - users:
        kairos:
          groups:
            - sudo
          passwd: kairos

logic here would append stages in initramfs

@nianyush
Copy link

nianyush commented Apr 19, 2024

Since golang slices by default is not a comparable type, this probably is not a bug but an expected behavior. But maybe we can implement some custom comparators for different stages to merge slices instead of appending them.
For reference,
https://github.com/knadh/koanf?tab=readme-ov-file#custom-merge-strategies

@mauromorales
Copy link
Member

@nianyush ah got it, sorry I thought it was generated code, but indeed if the user requests the same users multiple times they are just appened.

Hmm I would just be afraid that this kind of logic can make the parser "too" smart. When should it DRY config, and when not? Specially because user can be added at different stages, including also the users attribute. So there has to be some sort of logic saying, wait you already added that user in initrafms so no need to have it at a later stage.

@nianyush
Copy link

nianyush commented Apr 19, 2024

Yes you are right. And user might want to execute a same stage more than once but just in different postitions of initramfs

@ci-robbot
Copy link
Collaborator

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for opening the issue about the duplicated cloud config entries after the installation of Kairos. I have added the 'triage' label to your issue, which means that it has been reviewed and we're in the process of determining whether it's a bug or not. If you have more information to provide or any additional questions, feel free to add a comment to the issue. Thank you for using Kairos and for reporting this issue!

@jimmykarily
Copy link
Contributor

Should we close this then? Is it about cleaning up the configs or can't these multiplied entries not be avoided?

@jimmykarily
Copy link
Contributor

Relevant old issue: #1341

@Itxaka
Copy link
Member

Itxaka commented Apr 22, 2024

This should fix this issue, mainly making it irrelevant to have multiple users, as it wont trigger any side effects:

mudler/yip#145

@mudler
Copy link
Member Author

mudler commented Apr 22, 2024

Also related: #1665

@mudler mudler added prio: high and removed question Further information is requested labels Apr 22, 2024
@kreeuwijk
Copy link

I checked the CanvOS build logic and it only copies in the cloud-config data once to the ISO:

COPY --if-exists user-data /overlay/files-iso/config.yaml

But somehow, this file gets processed twice by Kairos, resulting in the duplicate entries. If we can remove the double processing, the duplicate entries problem becomes a LOT smaller.

@nianyush
Copy link

nianyush commented May 1, 2024

Seems like kairos is trying to read cloudconfigs under /system/oem/ and config.yaml under root of iso twice while generating 90_custom

@nianyush
Copy link

nianyush commented May 1, 2024

We used an provider to handle installation event and the stages injected through that provider is not duplicated.

@kreeuwijk
Copy link

@nianyush the user-data is getting processed twice, the contents are just merged, not injected. Merging causes lists to have duplicate entries, but regular values merge into a single result.

Could this be causing the double processing?

@nianyush
Copy link

nianyush commented May 2, 2024

@kreeuwijk no I have verified removing this line would not help. I will push a fix to remove this line in CanvOS

@kreeuwijk
Copy link

Based on some troubleshooting, the file /tmp/kairos-install-config-xxx.yaml165531931 at ISO installation time already contains the duplicate entries. This file gets written out by dumpCCStringToFile, which gets the data pretty much directly from cloud-init.

So unless that section of the Install() function does unwanted duplication, the problem must be in cloud-init itself?

@kreeuwijk
Copy link

Hmm even if I remove the config.yaml from the ISO entirely, I can see that the "Copying files to persistent path" step from /system/oem/80_stylus_uki.yaml in the after-install stage is copied in twice.

@mudler I'm starting to suspect that this is the culprit of all the duplication: https://github.com/kairos-io/kairos-agent/blob/main/internal/agent/install.go#L102-L103

@mudler
Copy link
Member Author

mudler commented May 2, 2024

@kreeuwijk right, we need to check if the collector is responsible indeed and we need to have a deeper look at it. That is the only place where it can actually happen we can duplicate config files.

Meanwhile mudler/yip#147 should have addressed #2488 and this issue shouldn't now be anymore blocking (even if not nice to have as a bug).

@mudler
Copy link
Member Author

mudler commented May 2, 2024

Also https://github.com/mudler/yip/blob/93948138a7b6826ff8344952874faf48e08e0400/pkg/plugins/datasource.go#L250 could have caused this, but we addressed that specific case already in kairos-io/kairos-sdk#58 , but maybe we still do some mangling in the system cloud configs that could affect that? https://github.com/kairos-io/packages/blob/1a6d7667700a3fdf09eabd8729885c10117bffd6/packages/static/kairos-overlay-files/files/system/oem/00_datasource.yaml#L22

@kreeuwijk when you see duplicated content, are you supplying the cloud config on the userdata? or it always happens in any case?

@kreeuwijk
Copy link

kreeuwijk commented May 2, 2024

@mudler what does mudler/yip#147 do exactly? Does it only run one of the steps if it find duplicate entries?

I'm specifically seeing 2 things get duplicated:

As the content gets merged, you only see the duplication in practice for lists, hence it shows up in stages and in stylus.site.deviceUIDPaths (as that's a list too).

@mauromorales
Copy link
Member

mauromorales commented May 2, 2024

@kreeuwijk mudler/yip#147 will find the x duplicated users for example, and will run all of them but in serial. This to avoid any racecondition. There's also another fix that will try to reuse the uid if the user already exists mudler/yip#145

@kreeuwijk
Copy link

Ah I see, so you will end up with e.g. multiple entries in /etc/passwd but at least the entries all have the same uid

@mauromorales
Copy link
Member

@kreeuwijk actually just one entry because the User plugin for yip, will actually "update" the user

@kreeuwijk
Copy link

Hmmm I'll check that as I saw the kairos user end up twice in /etc/passwd

@mauromorales
Copy link
Member

@kreeuwijk yes I think that cold come from mudler/yip#145 because it's only checking if a user already exists, but not if a group already exists. So if you have 2 times the same user creation, it will create it twice (with 2 groups too) but it will try to reuse the uid from the first one for the second user ... I'm not sure if this is better than letting it just take a new user, we would need to investigate why it was done this way

@kreeuwijk
Copy link

As long as multiple entries in /etc/passwd or /etc/group don't break anything, I guess it's fine? But I've already seen at least 1 instance of an empty /etc/group (that would go back to normal by rebooting), so I think it's still not entirely error-free. But at least we're getting closer to squashing this bug.

@mauromorales
Copy link
Member

mauromorales commented May 2, 2024

yeah, I mean we have to make a decision here as to what constitutes a duplication, but the problem is that yaml can take anything without its context. So let's say for a user, maybe a duplicated username is enough to say this is a duplication, and the merge will take the last found (which can also be problematic because maybe you put different passwords there and the one that ends up taken is the last with another password), but every other item can also have its own context, running a command could get merged, but maybe you actually wanted to run two of them because you are doing a sync or a flush of something, and now we would only run one of it. I really think the moment we start doing "smart" merging things, we will break something for someone else, unless yip plugins, were the ones that had definitions of what constitutes a duplication for this particular scenario. What do you think @mudler ?

@kreeuwijk For me, one of the questions is, why does the canvas cloud-config file contain at least 2 creations of the user (one with a name and one without)? And can we just have 1 instead?

@kreeuwijk
Copy link

I would rather just avoid the duplication in the first place, instead of trying to deal with the fallout of duplication.

@kreeuwijk
Copy link

@mauromorales where are you seeing CanvOS make a user twice?

@mauromorales
Copy link
Member

@kreeuwijk yes that is my point that the configs already come with the duplication in the first place. The picture that was placed in the issue, comes from an installation using CanvOS. I never get such a duplication if I install a Kairos system with a single user

@mauromorales
Copy link
Member

you can simply see it because there is:

    - users:
        kairos:
          groups:
            - sudo
          passwd: kairos

and also

    - name: Create Kairos user
       users:
        kairos:
          groups:
            - sudo
          passwd: kairos

@nianyush
Copy link

nianyush commented May 3, 2024

@mauromorales i think that's from an old test installation... maybe i was giving users in multiple places. But in my most recent installation , I don't have such behavior and I can confirm I have only defined users stage in user data once but not in any other cloud config. Which I still see that entry got duplicated

@mauromorales
Copy link
Member

mauromorales commented May 3, 2024

@nianyush could you share all your different cloud-configs that you are using on a system? My assumption is that when I search for creation of users in stylus https://github.com/search?q=repo%3Aspectrocloud%2Fstylus%20passwd%3A%20kairos&type=code I see so many of them, so if by any chance the system picks 2+ of those, then you will have duplications

@kreeuwijk
Copy link

Correct, CanvOS is not setting a kairos user by default.
Now the end-user that's operating CanvOS will often modify the users for the target image to their liking. A basic starting point is setting a password for kairos, otherwise you can't login:

stages:
  initramfs:
    - users:
        kairos:
          groups:
            - sudo
          passwd: kairos

This is what most user-data files in CanvOS have. On a flashed device, you see this in /oem/90_custom.yaml twice due to the duplication. I checked the /etc/passwd file on a device and it did only show 1 entry for kairos, so at least the user entry isn't getting duplicated in the file.

@nianyush
Copy link

nianyush commented May 3, 2024

@mauromorales til the point I was facing issues, stylus service hasn't really started. Only the provider plugin to handle installation event is ran

@mauromorales mauromorales self-assigned this May 3, 2024
@mauromorales mauromorales removed the triage Add this label to issues that should be triaged and prioretized in the next planning call label May 3, 2024
mauromorales added a commit to kairos-io/kairos-sdk that referenced this issue May 3, 2024
Relates to kairos-io/kairos#2492

Signed-off-by: Mauro Morales <[email protected]>
mauromorales added a commit to kairos-io/kairos-sdk that referenced this issue May 3, 2024
Relates to kairos-io/kairos#2492

Signed-off-by: Mauro Morales <[email protected]>
mauromorales added a commit to kairos-io/kairos-agent that referenced this issue May 3, 2024
@mauromorales
Copy link
Member

So far our best assumption of the issue is as follows: the issue only happens on Canvos because it happens to pass a cloud config via a payload through the install event. While on the provider Kairos this could only be experienced when doing a config via a QR but we don't really have tests for this and it's rarely tested manually.

  1. CanvOS comes with a /run/initramfs/live/config.yaml file, which is picked by Stylus modified accordingly and returned in a payload through cc.
  2. The agent produces a final configuration file merging everything it finds in (source)
  • All the base directories, which includes /run/initramfs/live config read once
  • The payload's config cc config read twice (not exactly the same config since it was modified by the provider, but this would explain why we see all the duplication)
  • Config stanzas in the cmdline

While we haven't been able to reproduce exactly, this steps differ from the original busines logic which should be:

  1. Read the config from all base directories (including /run/initramfs/live), and any config stanzas in the cmdline and produce a semi final config
  2. Overwrite any key given in a payload cloud config. The result is the actual final config

This means that the /run/initramfs/live/config.yaml file and the payload cloud config will still be read but instead of being merged, the payload one will have precedence.

@mauromorales
Copy link
Member

fix is working with provider payload now

mauromorales added a commit to kairos-io/kairos-agent that referenced this issue May 14, 2024
mauromorales added a commit to kairos-io/kairos-agent that referenced this issue May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working prio: high unconfirmed
Projects
Archived in project
Development

No branches or pull requests

7 participants