Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal protocol conformance can be stripped by accident #1061

Closed
thomasvl opened this issue Aug 25, 2020 · 14 comments
Closed

Internal protocol conformance can be stripped by accident #1061

thomasvl opened this issue Aug 25, 2020 · 14 comments

Comments

@thomasvl
Copy link
Collaborator

Trying to sync in current HEAD to google, we're seeing some of the new tests adding when addressing #1048 fail on linux with a compiler that is close to HEAD. I'm guessing that means compiler bug, but we haven't gotten a chance to dig.

Tests/SwiftProtobufTests/Test_JSON_Conformance.swift:151: error: Test_JSON_Conformance.testNullSupport_oneofNullValue : XCTAssertNotEqual failed: ("SwiftProtobufTests.ProtobufTestMessages_Proto3_TestAllTypesProto3:
") is equal to ("SwiftProtobufTests.ProtobufTestMessages_Proto3_TestAllTypesProto3:
") - 
Tests/SwiftProtobufTests/Test_JSON_Conformance.swift:160: error: Test_JSON_Conformance.testNullSupport_oneofNullValue : XCTAssertEqual failed: ("{}") is not equal to ("{"oneofNullValue":null}") - 
Tests/SwiftProtobufTests/Test_JSON_Conformance.swift:167: error: Test_JSON_Conformance.testNullSupport_oneofNullValue : XCTAssertEqual failed: ("[]") is not equal to ("[192, 7, 0]") - 
Tests/SwiftProtobufTests/Test_JSON_Conformance.swift:211: error: Test_JSON_Conformance.testNullSupport_repeatedNullValue : failed - Decode failed with error illegalNull: {"repeatedNullValue": [0, "NULL_VALUE", null]}
@thomasvl thomasvl added the bug label Aug 25, 2020
@tbkka
Copy link
Contributor

tbkka commented Aug 25, 2020

Ouch. I'll take a look.

@tbkka tbkka self-assigned this Aug 25, 2020
@allevato
Copy link
Collaborator

The toolchain we build internally is currently at snapshot 2020-08-18, but unfortunately I can't reproduce the failure with any of the public toolchains. The tests pass with the 2020-08-18 both on macOS and on a clean Ubuntu image I just fired up. I also synced our internal toolchain back to a previous state pre-2020-08-18 and it also failed there 🤔

Some print debugging is showing that the dynamic cast for the _CustomJSONCodable conformance in decodeSingularEnumField is failing for us. What's puzzling is that it only fails for NullValue, since we have tests for the other well-known types and those are still passing... Thomas pointed out that one difference here is that NullValue is an enum while the others are structs.

This could be an issue that's only related to our internal builds. We'll scrounge around some more.

@tbkka
Copy link
Contributor

tbkka commented Aug 25, 2020

Based on the failure messages Thomas shared, I was going to guess it was an issue with casting from a SwiftProtobuf.Enum existential to _CustomJSONCodable. I wonder if swiftlang/swift#33561 might fix it?

@allevato
Copy link
Collaborator

allevato commented Aug 26, 2020

I figured out the issue here, after a couple hours of using gdb and printf debugging in the Swift runtime. 😰

The conformance Google_Protobuf_NullValue: _CustomJSONCodable is unique because it's in a .swift file by itself. Our build process (using Bazel) is also unique in that, unlike SwiftPM which passes all the object files directly to the linker, we build a static archive and pass that to the linker.

There will never be references to symbols in the Google_Protobuf_NullValue+Extensions.swift.o file unless we did so explicitly, but we don't because the _CustomJSONCodable protocol is only ever used as a dynamic check and existential. This means in our case, the linker will never fetch that object file from the static archive, and the conformance never exists in the final binary.

So, this is basically the same problem Objective-C developers faced with source files containing only categories and why the -ObjC flag exists in ld64, just on a different platform this time. Except it's a little worse because I can't figure out a way to tell lld to only force-load object files if they contain Swift metadata symbols (--undefined doesn't seem to work, maybe because they're in a different section?). So we'd have to force-load the entire archive, which would bloat binaries that didn't need everything.

A fairly simple solution would be to just move the extension out of its own file and into a file that we know would be fetched if the implementation is needed, like JSONDecoder.swift (to pick an arbitrary but related one).

Another option would be to create a dummy symbol in the file and then reference it from another dummy symbol in a file that we know will get fetched. I'll need to test if visibility matters here (e.g., internal might get removed by the compiler too early and then we're back to the original problem, and it wouldn't be great to have to leak a public symbol somewhere for this).

@tbkka
Copy link
Contributor

tbkka commented Aug 27, 2020

Nice work!

I've been considering adding an ExpressibleByNilLiteral conformance to NullValue. Would that help here?

Alternatively, we could reasonably bundle this into any of several other files. Hmmm....

@thomasvl
Copy link
Collaborator Author

Given the other files like that, yes, adding something public should work since it should get tagged in such a way to force it to get included. The other option is just another file with something public. We just have to remember to keep these sorta things always in files with something public.

@allevato
Copy link
Collaborator

I tested it and just having a conformance for ExpressibleByNilLiteral (or any public symbol) in the same object file isn't enough to get it included. There would also have to be a reference to that symbol—so if we added tests for the new conformance, it would get included in our test binary, but if someone wrote a binary that didn't use the conformance, they wouldn't get the custom JSON behavior either.

The issue is that nothing signals to the linker that the .o file for this conformance needs to be fetched on Linux when it's in a static archive. This is a wider-spread problem for Swift in general than just for swift-protobuf, but the rest of the Swift ecosystem manages to avoid it in various other ways that mask the problem:

  • Swift Package Manager and Xcode both pass the .o files directly to the linker, rather than creating an intermediate static archive, so the .o files are always fetched whether there are references to symbols in them or not.
  • ld64 has a special case in its Obj-C category handling to also force-load object files that contain __swift sections when you use the -ObjC flag.

The safest thing to do here would be to put the conformances in the same file as the type itself so that if the type is fetched, the conformances are as well, but since those files are generated, that makes it less appealing.

This also isn't a problem for just NullValue, it's just the message where it's easiest to cause this problem because there are no symbols in the source file containing the conformance that are externally referenced. In the other Google_Protobuf_*+Extensions.swift files, there are other convenience helpers, so if a user users those, they'll end up pulling the whole file in. But I did notice that when I commented out all the tests except the failing ones to do my testing, some of the other conformances were missing as well, so it's possible that a user could run into this problem elsewhere through the right (or wrong) alignment of the stars in their code.

@thomasvl thomasvl removed the bug label Aug 27, 2020
@thomasvl
Copy link
Collaborator Author

So I guess the "safest" thing to do is move all the json conformances into a file that needs them so the they always get picked up? The public helpers in those files can be left as is and they will even strip correct if not needed?

@allevato
Copy link
Collaborator

So I guess the "safest" thing to do is move all the json conformances into a file that needs them so the they always get picked up?

Either that, or add a symbol in that file and then reference it from a file that will be included, but that might require the symbol to be public and thus leak out of the module.

The public helpers in those files can be left as is and they will even strip correct if not needed?

If by "strip" you mean "the object file will not be fetched by the linker from a static archive if it is not referenced" (as opposed to dead-code stripping), yes.

@thomasvl
Copy link
Collaborator Author

Keeping this open to track since it might make sense to tweak things to avoid things breaking if someone builds a lib on linux out of the library, but it is less critical at the moment.

@thomasvl
Copy link
Collaborator Author

ps - I forgot to mention, adding to my confusion with this when it first showed up, all of the other new tests added passed without the conformance. i.e. - they would pass for any Enum not just this one, which made me have to go try to figure out why/how they passed.

@thomasvl thomasvl changed the title New tests from #1048 fail on linux with closer to trunk compiler Internal protocol conformance can be stripped by accident Sep 14, 2020
@thomasvl
Copy link
Collaborator Author

As @allevato mentioned, SwiftPM dodges the issue, but the problem is always possible if someone builds things into a library and then links the library. While we can rearrange the sources to avoid this, it almost seems like some sorta automation to try and detect this would be a good things since we could run into this again if we added another protocol for something else.

@al45tair
Copy link
Contributor

al45tair commented Sep 3, 2021

lld is supposed to be compatible with GNU ld, which I believe has a --whole-archive option (similar to the -all_load option in macOS ld), which would function as a workaround for this problem.

@thomasvl
Copy link
Collaborator Author

thomasvl commented Mar 7, 2022

I'm going to go ahead and close this, unless we come up with a good way to test for this, even shuffling the sources to avoid the problem at the moment seems error prone as we could regress at anytime without noticing.

@thomasvl thomasvl closed this as completed Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants