Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable NSS modules when using the leakchecker #62

Conversation

KJTsanaktsidis
Copy link
Contributor

@KJTsanaktsidis KJTsanaktsidis commented Jan 19, 2024

The leakchecker will report leaked file descriptors when tests do things like access Etc.getgrgid, for example, if NSS modules (like sss) handle these lookups by connecting to a daemon like sssd and leave the connection open.

To address this, we can call glibc's __nss_configure_lookup to override NSS modules configured in /etc/nsswitch.conf and only use ordinary file/DNS lookups.

This should fix the rubyci.org failures on Amazon Linux like this one: http://rubyci.s3.amazonaws.com/amazon2023/ruby-master/log/20240119T003005Z.log.html.gz#rubyspec

@KJTsanaktsidis
Copy link
Contributor Author

@eregon @headius do you see any problems with doing this on jruby or truffleruby? AFAICT jruby seems to have a working FFI implementation and I can't see a reason why the NSS modules wouldn't be just as problematic on the JVM?

@eregon
Copy link
Member

eregon commented Jan 19, 2024

Do you know how the test-all leakchecker deal with this case?

Your description makes a lot of sense, I'm just wondering if there is another way to handle this because calling __nss_configure_lookup feels a bit low-level.

@eregon
Copy link
Member

eregon commented Jan 19, 2024

@eregon @headius do you see any problems with doing this on jruby or truffleruby? AFAICT jruby seems to have a working FFI implementation and I can't see a reason why the NSS modules wouldn't be just as problematic on the JVM?

It likely does indeed.
The MSpec LeakChecker is opt-in, I think currently neither TruffleRuby nor JRuby enable it (using CHECK_LEAKS=true).
One reason is it uses ObjectSpace.each_object and that's really expensive on non-CRuby.

Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and the investigation!

@eregon
Copy link
Member

eregon commented Jan 19, 2024

Could you rebase so we get a run of ruby/spec in CI?
I changed the Check ruby/spec still passes with MSpec changes job to use Ruby 3.3.0 in #63

The leakchecker will report leaked file descriptors when tests do things
like access `Etc.getgrgid`, for example, if NSS modules (like `sss`)
handle these lookups by connecting to a daemon like `sssd` and leave the
connection open.

To address this, we can call glibc's `__nss_configure_lookup` to
override NSS modules configured in /etc/nsswitch.conf and only use
ordinary file/DNS lookups.
@KJTsanaktsidis
Copy link
Contributor Author

Do you know how the test-all leakchecker deal with this case?

I think it just doesn't - i've definitely seen lsof output from this line https://github.com/ruby/ruby/blob/ac4046d34b4e0850e9ff7573b795284fea6c2741/tool/lib/leakchecker.rb#L115 on CI, i think it just doesn't cause the test to fail. I could certainly add something like this there as well.

Your description makes a lot of sense, I'm just wondering if there is another way to handle this because calling __nss_configure_lookup feels a bit low-level.

I really expected there to be an environment variable to disable NSS modules, but there just isn't - i spent quite a while staring at the glibc & sss source code looking for something. There is a way to ask NSS module to clean up, by calling __libc_freeres(), but that instructs glibc to free all of its resources - it's designed for valgrind to call before process exit, and if you call it and keep running all kinds of strange things happen (like *environ getting freed). So, short version, no :(

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/fix_failing_amazon_linux_leaks branch from 0710533 to 8e4c979 Compare January 19, 2024 23:09
@KJTsanaktsidis
Copy link
Contributor Author

Hm. It has the same failures as https://github.com/ruby/mspec/actions/runs/7582829391/job/20653202072. I don't know why, the ruby spec suite passed when I ran it with this mspec branch on my machine.

@KJTsanaktsidis
Copy link
Contributor Author

Oh, I see cruby has disabled the failing set tests, for example, because the set gem v1.1.0 has a stricter test for sett-iness: ruby/ruby@dc7785e

I thought that ruby/spec was mirrored into ruby/ruby but seems like the correspondence is somewhat looser.

@KJTsanaktsidis
Copy link
Contributor Author

Anyway I'll merge this (ruby/mspec is mirrored into the ruby/ruby tree afaict), the set stuff is a different problem.

@KJTsanaktsidis KJTsanaktsidis merged commit f9d657c into ruby:master Jan 21, 2024
8 of 9 checks passed
@KJTsanaktsidis KJTsanaktsidis deleted the ktsanaktsidis/fix_failing_amazon_linux_leaks branch January 21, 2024 08:08
KJTsanaktsidis added a commit to KJTsanaktsidis/ruby that referenced this pull request Jan 21, 2024
The leakchecker will report leaked file descriptors when tests do things
like access `Etc.getgrgid`, for example, if NSS modules (like `sss`)
handle these lookups by connecting to a daemon like `sssd` and leave the
connection open.

To address this, we can call glibc's `__nss_configure_lookup` to
override NSS modules configured in /etc/nsswitch.conf and only use
ordinary file/DNS lookups.

(This is a cherry-pick of a patch applied to ruby/mspec here:
ruby/mspec#62)
KJTsanaktsidis added a commit to KJTsanaktsidis/ruby that referenced this pull request Jan 22, 2024
The leakchecker will report leaked file descriptors when tests do things
like access `Etc.getgrgid`, for example, if NSS modules (like `sss`)
handle these lookups by connecting to a daemon like `sssd` and leave the
connection open.

To address this, we can call glibc's `__nss_configure_lookup` to
override NSS modules configured in /etc/nsswitch.conf and only use
ordinary file/DNS lookups.

(This is a cherry-pick of a patch applied to ruby/mspec here:
ruby/mspec#62)
KJTsanaktsidis added a commit to ruby/ruby that referenced this pull request Jan 22, 2024
The leakchecker will report leaked file descriptors when tests do things
like access `Etc.getgrgid`, for example, if NSS modules (like `sss`)
handle these lookups by connecting to a daemon like `sssd` and leave the
connection open.

To address this, we can call glibc's `__nss_configure_lookup` to
override NSS modules configured in /etc/nsswitch.conf and only use
ordinary file/DNS lookups.

(This is a cherry-pick of a patch applied to ruby/mspec here:
ruby/mspec#62)
@eregon
Copy link
Member

eregon commented Jan 23, 2024

@KJTsanaktsidis This broke ruby/spec CI on 3.0 and 3.1 on Linux, see https://github.com/ruby/spec/actions/runs/7624400587/job/20766459296?pr=1129
Could you fix it?

@KJTsanaktsidis
Copy link
Contributor Author

Oh I see, I must be using fiddle features that are too new (did we move the names of the types around?). I’ll fix it first thing tomorrow - sorry about this!

@KJTsanaktsidis
Copy link
Contributor Author

Yeah, looks like we can't depend on ruby/fiddle@49fa723 or ruby/fiddle@0ffcaa3 being available.

#66 should fix it @eregon . My apologies again!

@eregon
Copy link
Member

eregon commented Jan 24, 2024

@KJTsanaktsidis No worries, it happens, it's tricky to support older Rubies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants