Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compilation without HAVE_GSS_KRB5_CRED_NO_CI_FLAGS_X #70723

Closed
wants to merge 1 commit into from

Conversation

filipnavara
Copy link
Member

Mostly theoretical fix since the only supported system that doesn't have GSS_KRB5_CRED_NO_CI_FLAGS_X is RHEL 7. It could still happen with stale CMake cache though.

Ref: #70447 (comment)

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jun 14, 2022
@ghost
Copy link

ghost commented Jun 14, 2022

Tagging subscribers to this area: @dotnet/ncl, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Mostly theoretical fix since the only supported system that doesn't have GSS_KRB5_CRED_NO_CI_FLAGS_X is RHEL 7. It could still happen with stale CMake cache though.

Ref: #70447 (comment)

Author: filipnavara
Assignees: -
Labels:

area-System.Net.Security, community-contribution

Milestone: -

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

can you try it @EgorBo? Waiting for CI may not be worth as it passed on the original change.

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo
Copy link
Member

EgorBo commented Jun 14, 2022

I don't think it's a stale CMake cache - it's a WSL I've just created (Ubuntu 20.04) and followed all the steps to configure it for dotnet/runtime

Might be important that I use cross-compilation to build dotnet/runtime for linux-arm64 there (rootfs stuff)

@EgorBo
Copy link
Member

EgorBo commented Jun 14, 2022

I've just verified the fix and it works for me, thanks!

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

Test failures seems relevant @filipnavara. There is some weird SIGSEGV....

@filipnavara
Copy link
Member Author

Test failures seems relevant @filipnavara. There is some weird SIGSEGV....

I have seen that same crash locally on the same test suites. It happened even with clean checkout but due to some of the tests not being run on debug builds (eg. the whole System.Net.Http.FunctionalTests) it could have easily been lurking here for a while. When run locally the crash happened only on certain runs. I'll look at the core dumps.

@AaronRobinsonMSFT
Copy link
Member

Hitting this issue in the CI - https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/1823854/logs/506.

2022-06-14T16:24:41.4050844Z [ 10%] Building C object libs-native/System.Security.Cryptography.Native/CMakeFiles/objlib.dir/apibridge.c.o
2022-06-14T16:24:41.4180214Z /__w/1/s/src/native/libs/System.Net.Security.Native/pal_gssapi.c:162:9: error: macro is not used [-Werror,-Wunused-macros]
2022-06-14T16:24:41.4190934Z #define PER_FUNCTION_BLOCK(fn) \
2022-06-14T16:24:41.4192051Z         ^
2022-06-14T16:24:41.4290025Z 1 error generated.
2022-06-14T16:24:41.4368014Z libs-native/System.Net.Security.Native/CMakeFiles/System.Net.Security.Native-Static.dir/build.make:75: recipe for target 'libs-native/System.Net.Security.Native/CMakeFiles/System.Net.Security.Native-Static.dir/pal_gssapi.c.o' failed
2022-06-14T16:24:41.4371145Z CMakeFiles/Makefile2:1938: recipe for target 'libs-native/System.Net.Security.Native/CMakeFiles/System.Net.Security.Native-Static.dir/all' failed
2022-06-14T16:24:41.4373529Z make[3]: *** [libs-native/System.Net.Security.Native/CMakeFiles/System.Net.Security.Native-Static.dir/pal_gssapi.c.o] Error 1
2022-06-14T16:24:41.4375637Z make[2]: *** [libs-native/System.Net.Security.Native/CMakeFiles/System.Net.Security.Native-Static.dir/all] Error 2

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

what OS @AaronRobinsonMSFT? The original PR had all legs clean.

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Jun 14, 2022

what OS @AaronRobinsonMSFT? The original PR had all legs clean.

Agreed. Unsure why it is failing. This is on CoreCLR Product Build Linux x86 checked.

See #70685

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

should be mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-18.04-cross-x86-linux-20211022152824-f853169 if I'm reading the logs correctly.

@filipnavara
Copy link
Member Author

filipnavara commented Jun 14, 2022

Test failures seems relevant @filipnavara. There is some weird SIGSEGV....

This is going to be fun, the crash is inside libgssapi_krb5.so.2. I didn't get to obtain the right symbols yet.

The managed stack leading to it:

00007EF266FFA118 00007ef2840643c7 [InlinedCallFrame: 00007ef266ffa118] Interop+NetSecurityNative.<InitiateCredWithPassword>g____PInvoke|17_0(Status*, Int32, IntPtr, Byte*, Int32, IntPtr*)
00007EF266FFA118 00007f33328d9e4c [InlinedCallFrame: 00007ef266ffa118] Interop+NetSecurityNative.<InitiateCredWithPassword>g____PInvoke|17_0(Status*, Int32, IntPtr, Byte*, Int32, IntPtr*)
00007EF266FFA110 00007F33328D9E4C ILStubClass.IL_STUB_PInvoke(Status*, Int32, IntPtr, Byte*, Int32, IntPtr*)
00007EF266FFA220 00007F33328D9B80 Interop+NetSecurityNative.InitiateCredWithPassword(Status ByRef, Boolean, Microsoft.Win32.SafeHandles.SafeGssNameHandle, System.String, Int32, Microsoft.Win32.SafeHandles.SafeGssCredHandle ByRef) [/_/src/libraries/System.Net.Http/src/Microsoft.Interop.LibraryImportGenerator/Microsoft.Interop.LibraryImportGenerator/LibraryImports.g.cs @ 328]
00007EF266FFA430 00007F33328D9009 Microsoft.Win32.SafeHandles.SafeGssCredHandle.Create(System.String, System.String, Boolean) [/_/src/libraries/Common/src/Microsoft/Win32/SafeHandles/GssSafeHandles.cs @ 120]
00007EF266FFA520 00007F33328D6EFF System.Net.Security.SafeFreeNegoCredentials..ctor(Boolean, System.String, System.String, System.String) [/_/src/libraries/Common/src/System/Net/Security/Unix/SafeFreeNegoCredentials.cs @ 74]
00007EF266FFA600 00007F33328D6AF1 System.Net.Security.NegotiateStreamPal.AcquireCredentialsHandle(System.String, Boolean, System.Net.NetworkCredential) [/_/src/libraries/Common/src/System/Net/Security/NegotiateStreamPal.Unix.cs @ 532]
00007EF266FFA6E0 00007F33328D6881 System.Net.NTAuthentication.Initialize(Boolean, System.String, System.Net.NetworkCredential, System.String, System.Net.ContextFlagsPal, System.Security.Authentication.ExtendedProtection.ChannelBinding) [/_/src/libraries/Common/src/System/Net/NTAuthentication.Common.cs @ 128]
00007EF266FFA7D0 00007F33328D64C4 System.Net.NTAuthentication..ctor(Boolean, System.String, System.Net.NetworkCredential, System.String, System.Net.ContextFlagsPal, System.Security.Authentication.ExtendedProtection.ChannelBinding) [/_/src/libraries/Common/src/System/Net/NTAuthentication.Common.cs @ 98]
00007EF266FFA820 00007F33328B6707 System.Net.Http.AuthenticationHelper+<SendWithNtAuthAsync>d__52.MoveNext() [/_/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/AuthenticationHelper.NtAuth.cs @ 169]

@filipnavara
Copy link
Member Author

I'm somewhat convinced that the crashing code is null dereference here:

https://github.com/krb5/krb5/blob/f573f7f8ee5269103a0492d6521a3242c5ffb63b/src/lib/gssapi/krb5/gssapi_krb5.c#L487-L488

I have no hard evidence though. Unless I find something quick I am fine with dropping the whole HAVE_GSS_KRB5_CRED_NO_CI_FLAGS_X support instead of trying to fix it.

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

We can roll-back the original change to get to stable point so we have more time to investigate. It seems to be specific to particular versions and that should help.

@filipnavara
Copy link
Member Author

We can roll-back the original change to get to stable point so we have more time to investigate.

I think that's reasonable at this point.

@filipnavara
Copy link
Member Author

Hmm, both of the failures were on RedHat 7 machines. Those are the only ones that should be missing GSS_KRB5_CRED_NO_CI_FLAGS_X. I'm feeling like I am missing something really obvious.

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

I can look as well @filipnavara. maybe NULL after the load/lookup fails?

@filipnavara
Copy link
Member Author

filipnavara commented Jun 14, 2022

maybe NULL after the load/lookup fails?

My expectation is that the build is done on a machine that has HAVE_GSS_KRB5_CRED_NO_CI_FLAGS_X set. The dlsym should fail for GSS_KRB5_CRED_NO_CI_FLAGS_X, gss_set_cred_option, or both.

For this build case we should get #define GSS_KRB5_CRED_NO_CI_FLAGS_X_AVAILABLE (gss_set_cred_option_ptr != NULL && GSS_KRB5_CRED_NO_CI_FLAGS_X_ptr != NULL) and that is supposed to guard all the call paths that would possibly call into the APIs.

There may be some flaw in that logic that I am not seeing, or RedHat is shipping something that I didn't expect. The dumps showed the crash actually happened inside the libgssapi_krb5 library so I would not rule out that RedHat shipped some backported patches that don't quite work.

@filipnavara
Copy link
Member Author

Is there any way to figure out what exact RHEL version is running in the RedHat.7.Amd64.Open pool? (and ideally the krb5-libs package version) /cc @MattGal

@filipnavara
Copy link
Member Author

@MattGal
Copy link
Member

MattGal commented Jun 14, 2022

Is there any way to figure out what exact RHEL version is running in the RedHat.7.Amd64.Open pool? (and ideally the krb5-libs package version) /cc @MattGal

If you have access to the dnceng/internal project you can check out the image definition yamls (I don't think you do?) but currently it's running an image derived from this Azure gallery image:

  Image:
    Publisher: RedHat
    Offer: RHEL
    Sku: 7_9
    Version: 7.9.2022032201

I can try to get the /etc/os-release value off this image if it's important too.

@wfurt
Copy link
Member

wfurt commented Jun 14, 2022

I will try to grab repro machine so we can do more testing...

@filipnavara
Copy link
Member Author

Thanks, that's good enough for me!

@filipnavara
Copy link
Member Author

filipnavara commented Jun 15, 2022

I will try to grab repro machine so we can do more testing...

I created a machine from the Azure image and downloaded the crashing Helix payload on it. I run it couple of times and it didn't crash. I added the gssntlmssp package now (not available in the official repos, had to hunt down the .rpm) but it still doesn't seem to crash. RedHat doesn't exactly make it easy to get debug symbols but I will try to check whether there is something salvageable in the crash dump.

UPD: Eventually it crashed on 7th run or so with the NTLM SSP installed, so I guess it may repro on this image.

If you have access to the dnceng/internal project you can check out the image definition yamls (I don't think you do?)

Correct, I don't have access to that repository.

@filipnavara
Copy link
Member Author

filipnavara commented Jun 15, 2022

I managed to get somewhat more useful stack trace:

#0  0x00007f2b2b3353c7 in gssint_get_mechanism (oid=0x20) at g_initialize.c:1128
#1  0x00007f2b2b3380af in gss_set_cred_option (minor_status=0x7f2afbffbed0,
    cred_handle=cred_handle@entry=0x7f2afbffbe88, desired_object=0x7f2b2b564610 <krb5_gss_oid_array+112>,
    value=0x7f2afbffc068) at g_set_cred_option.c:151
#2  0x00007f2b2b338213 in gssspi_set_cred_option (minor_status=<optimized out>, cred=0x7f2b0c148400,
    desired_object=<optimized out>, value=<optimized out>) at g_set_cred_option.c:196
#3  0x00007f2b2b338078 in gss_set_cred_option (minor_status=minor_status@entry=0x7f2afbffbfa0,
    cred_handle=cred_handle@entry=0x7f2afbffbf30,
    desired_object=desired_object@entry=0x7f2b2b564610 <krb5_gss_oid_array+112>, value=0x7f2afbffc068)
    at g_set_cred_option.c:160
#4  0x00007f2b2b354a21 in spnego_gss_set_cred_option (minor_status=0x7f2afbffbfa0, cred_handle=0x7f2b0c148e00,
    desired_object=0x7f2b2b564610 <krb5_gss_oid_array+112>, value=<optimized out>) at spnego_mech.c:2432
#5  0x00007f2b2b338078 in gss_set_cred_option (minor_status=0x7f2afbffc064, cred_handle=<optimized out>,
    desired_object=0x7f2b2b564610 <krb5_gss_oid_array+112>, value=0x7f2afbffc068) at g_set_cred_option.c:160
#6  0x00007f6bd409dbe3 in AcquireCredWithPassword (minorStatus=0x7f2afbffc4c8, isNtlm=0, desiredName=0x7f2b0c148510,
    password=0x7f2afbffc220 "password", passwdLen=8, credUsage=1, outputCredHandle=0x7f2afbffc3d0)
    at /__w/1/s/src/native/libs/System.Net.Security.Native/pal_gssapi.c:643
#7  0x00007f6bd409d945 in NetSecurityNative_InitiateCredWithPassword (minorStatus=0x7f2afbffc4c8, isNtlm=0,
    desiredName=0x7f2b0c148510, password=0x7f2afbffc220 "password", passwdLen=8, outputCredHandle=0x7f2afbffc3d0)

#3 correctly sends valid gss_union_cred_t structure to gss_set_cred_option. It has one mechanism - NTLM - and it's associated credential as a gssntlm_cred structure. NTLM SSP, however, does not implement gssspi_set_cred_option so it should fail straight away. Instead it goes through the generic gssspi_set_cred_option implementation which goes to gss_set_cred_option and expects gss_union_cred_t structure where gssntlm_cred structure is passed. This is definitely bug in the GSSAPI implementation but I didn't find yet when it was fixed and how.

@filipnavara
Copy link
Member Author

filipnavara commented Jun 15, 2022

Reduced repro of the SIGSEGV crash on RHEL7 w/ NTLM SSP:

#include <gssapi/gssapi_ext.h>
#include <gssapi/gssapi_krb5.h>
#include <assert.h>
#include <stdio.h>

static char gss_spnego_oid_value[] = "\x2b\x06\x01\x05\x05\x02"; // Binary representation of SPNEGO Oid (RFC 4178)
static gss_OID_desc gss_mech_spnego_OID_desc = {.length = 6, .elements = gss_spnego_oid_value};

int main()
{
    uint32_t majorStatus;
    uint32_t minorStatus;
    gss_cred_id_t credHandle = NULL;

    gss_name_t name = NULL;
    gss_buffer_desc inputNameBuffer = {.length = 4, .value = "user"};
    majorStatus = gss_import_name(&minorStatus, &inputNameBuffer, GSS_C_NT_USER_NAME, &name);
    assert(majorStatus == GSS_S_COMPLETE);

    gss_OID_desc gss_mech_OID_desc = gss_mech_spnego_OID_desc;
    gss_OID_set_desc gss_mech_OID_set_desc = {.count = 1, .elements = &gss_mech_OID_desc};
    gss_OID_set desiredMech = &gss_mech_OID_set_desc;
    gss_buffer_desc passwordBuffer = {.length = 8, .value = "password"};
    majorStatus = gss_acquire_cred_with_password(
        &minorStatus, name, &passwordBuffer, 0, desiredMech, GSS_C_INITIATE, &credHandle, NULL, NULL);
    assert(majorStatus == GSS_S_COMPLETE);

    gss_buffer_desc emptyBuffer = GSS_C_EMPTY_BUFFER;
    majorStatus = gss_set_cred_option(&minorStatus, &credHandle, GSS_KRB5_CRED_NO_CI_FLAGS_X, &emptyBuffer);
    assert(majorStatus == GSS_S_UNAVAILABLE || majorStatus == GSS_S_COMPLETE);

    return 0;
}

@filipnavara
Copy link
Member Author

Funnily enough, the bug is still present even in latest krb5 packages. It just happens that the structures align in such a way that it doesn't crash.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 15, 2022
@karelz karelz added this to the 7.0.0 milestone Jul 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Security community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants