Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying disk quotas causes unnecessary file attribute updates #13115

Closed
mariusklocke opened this issue Mar 12, 2024 · 11 comments · Fixed by #14019
Closed

Modifying disk quotas causes unnecessary file attribute updates #13115

mariusklocke opened this issue Mar 12, 2024 · 11 comments · Fixed by #14019
Assignees
Milestone

Comments

@mariusklocke
Copy link

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04 LTS
  • Storage Backend: dir Backend on XFS with project quotas enabled
  • The output of snap list --all lxd core20 core22 core24 snapd:
core20  20231123       2105   latest/stable  canonical✓  base,disabled
core20  20240111       2182   latest/stable  canonical✓  base
lxd     5.0.2-d4d8da9  26741  5.0/stable     canonical✓  disabled,held
lxd     5.0.3-51452c3  26881  5.0/stable     canonical✓  held
snapd   2.61.1         20671  latest/stable  canonical✓  snapd,disabled
snapd   2.61.2         21184  latest/stable  canonical✓  snapd

Issue description

Updating the root disk size in LXD takes longer for containers with many files, because every file in the root disk is touched. I observed cases where the instance update took more than 5 minutes. LXD updates the fsx_projid and fsx_xflags attributes of all "regular" files. These updates are no-op from my point of view and should be skipped. I cannot image a scenario where there would be a change for those attributes during the lifecycle of a container. Touching every file slows down instance updates although there are no actual changes to the file attributes. I think the code responsible is in here. I think it should be the same issue on EXT4 (not tested).

Steps to reproduce

  1. Create a storage backend with dir driver on a XFS volume with project quotas enabled
  2. Create a container with root disk size of 10GiB
  3. Check extended file attributes using xfs_quota -x -c 'stat' /mnt/containers/c-1/rootfs/etc/hosts
  4. Check ctime using stat /mnt/containers/u1/rootfs/etc/hosts
  5. Update container root disk size to 15GiB: lxc config device set c-1 root size="15GiB"
  6. Re-check extended file attributes and ctime: ctime has been updated, but no file attributes have changed
@mariusklocke
Copy link
Author

If you agree to my conclusions: I already spotted an easy fix for this issue. The setProject function is only called from here. Some lines above there already is a check if the quota project ID has changed. I think the issue would be resolved, if the quota.setProject() call would be moved to into the if statement body above.

@mariusklocke
Copy link
Author

Is there missing anything in my issue report? I would appreciate any kind of feedback.

I recently discovered, that altering every file also causes errors, when the container executes a process where files are deleted while the instance update in LXD is still executing:

Error: Failed to update device "root": lstat /var/snap/lxd/common/lxd/storage-pools/default/containers/my-awesome-container/rootfs/tmp/YmCbeF: no such file or directory

@tomponline
Copy link
Member

Thanks for your report. We will endeavour to investigate this soon.

I'm not sure why the files are all modified, although I suspect its to do with ensuring all files take part in the quota.
Perhaps there is a better way.

@tomponline tomponline added this to the lxd-6.1 milestone Apr 30, 2024
@tomponline tomponline modified the milestones: lxd-6.1, lxd-6.2 May 13, 2024
@mariusklocke
Copy link
Author

mariusklocke commented Jul 24, 2024

I just made a PR for this. We encounter the "no such file or directory" error now regularly and have no other option than to retry, which is a bummer for an operation which can take more than 5 minutes. I think a bugfix should be made for LXD 5.21

@tomponline
Copy link
Member

We encounter the "no such file or directory" error now regularly and have no other option than to retry, which is a bummer for an operation which can take more than 5 minutes.

A quick fix would be to have LXD detect this sort of error and continue rather than failing.

As for not updating project on all files, this will need more consideration I think.

@mariusklocke
Copy link
Author

I guess that how "quick" that fix would depend on who does the coding. I am not a Go developer at all and re-arranging code to fix a logical error is something I can contribute quickly. Getting into how error handling in Go works, is something very different.

@tomponline
Copy link
Member

See if #13815 helps once it lands in latest/edge.

To be clear im not rejecting your PR, but in isolation I don't believe it will prevent the "no such file or directory" errors you're getting because the error appears to be happening in quota.SetProject.

Admittedly the likely hood of it happening is reduced by your PR as LXD won't do two walks of the container's filesystem, first to remove the project and then to re-add the new one.

I think your PR could provide a speed up indeed, but I will need to consider it more carefully because I'd like to understand why it was done that way originally before changing it.

@mariusklocke
Copy link
Author

mariusklocke commented Jul 24, 2024

I'm not entirely happy, but thanks for providing an alternative solution to the issue. My primary use case is all containers with all limited root disks. If I understand the handling of quota project IDs in LXD correctly, they should never actually change except there is a new volume ID for the root disk (don't know if that ever happens). Only then the "file walk" should be required. My PR would fix performance and also "not found" errors for my use case, because the "file walk" would never happen in my use case (maybe once on container creation?). But maybe there are other use cases.

Any chance to get your fix in the currently supported LTS versions? Still preparing the LXD 5.21 update.

@tomponline
Copy link
Member

Only then the "file walk" should be required. My PR would fix performance and also "not found" errors for my use case, because the "file walk" would never happen in my use case (maybe once on container creation?). But maybe there are other use cases.

Understood, once my PR lands, if you can rebase your PR so its mergeable and then I'll review in more detail.
It makes sense to have the file missing check there in any case.

Any chance to get your fix in the currently supported LTS versions? Still preparing the LXD 5.21 update.

Yeah ill backport this and it'll be in LXD 5.21.3.

tomponline added a commit that referenced this issue Aug 30, 2024
Fixes #13115

This avoids setting project quotas when the project does not change
([from Incus](lxc/incus#1163)), and also avoids
unnecessary project setting on `DeleteProject`.

Reported-by: Marius Klocke
[[email protected]](mailto:[email protected])

Closes #13813
@mariusklocke
Copy link
Author

@tomponline I see that there have been backports to stable-5.21 in the last weeks, but the fix for this issue wasn't included?

@tomponline
Copy link
Member

I've included it in #14185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants