Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and test deduplicated header units #2516

Merged
merged 19 commits into from
Feb 12, 2022

Conversation

StephanTLavavej
Copy link
Member

@StephanTLavavej StephanTLavavej commented Feb 3, 2022

So far, we've been testing independent header units. In that scenario, <list>, <type_traits>, and <vector> can be built simultaneously, but their IFC files contain duplicate machinery. This consumes more space (to store it), more time (to read it), and asks the compiler to do more work (to discard the duplicate machinery, which is not a trivial task).

With the new /scanDependencies compiler option (implemented by Javier Matos Denizac during his compiler front-end internship; it supersedes the older /sourceDependencies:directives option), we can build deduplicated header units. After scanning the STL to build up a graph of header dependencies, we can build them in topologically sorted order. For example, we'll build <type_traits> early, then import that header unit while building <list> and <vector>. The scan phase is extremely fast (it can be done in parallel, in a single step). The build phase is slower and more complicated - this Python implementation builds topologically sorted "layers" in parallel, but each layer must be complete before the next layer can be built. (In theory, with a real build system like CMake/Ninja, more parallelism could be extracted, although I doubt there are significant gains.) The resulting IFCs are just as easy to consume (with a bunch of compiler options associating each header file to its IFC file, and a single LIB file containing codegen emitted along the way). The IFCs will also contain minimal duplicate machinery - the only potential source of such machinery is when multiple stl/inc headers directly include non-stl/inc headers. (Fortunately, in the recent-ish past, we developed a codebase convention that product code should always include <cmeow> instead of <meow.h>, which prevents CRT headers from being a significant problem.)

While deduplicated header units are generally less stressful for the compiler to consume, they exercise different codepaths (as they have to import some machinery while exporting more). Therefore, this PR tests both the independent and deduplicated scenarios, and I expect that we'll want to retain that forever-ish.

Currently, this PR adds test coverage to the Python-powered GitHub test harness only, not the Perl-powered internal test harness. In principle there is nothing stopping the code from being ported, just dev time and my lack of familiarity with both Perl and the internal test harness. If we repeatedly encounter compiler bugs that could have been caught before Preview releases, we can investigate extending this test coverage in the future.

  • header-units.json
    • Fix use_ansi.h by commenting it out. Within stl/inc, it's included by only yvals.h, and expects yvals.h to have defined _ITERATOR_DEBUG_LEVEL and yvals_core.h to have defined _STRINGIZE. This is not compatible with how header units work. (They can emit macros, they can consume macros from the command line, and they can consume macros from headers if they have included those headers directly or indirectly, but they can't assume that another header has been ambiently included.)
  • env.lst
    • Add /DTEST_TOPO_SORT. (It's listed first so that I could test it quickly with --max-tests=1.) This is an ordinary macro that the product code doesn't care about (hence not _Ugly), but that the Python script looks for, and that the test.cpp can activate workarounds for. Because it isn't special to the Perl script, we'll end up running the coverage twice, which is fine. 🐶 ☕ 🔥
  • test.cpp
    • Add workarounds for VSO-1471374 (fatal error C1116) and VSO-1471382 (error C2672), compiler bugs that I've reported to our compiler front-end wizard @cdacamar. 🪄
  • custom_format.py
    • This may be easier to read as "totally overhauled file that does both old and new scenarios", but I have commits that are structured as a series of cleanups, followed by the new additions.
    • Use formatted string literals. I think that this makes the results slightly easier to visualize.
    • Separate exportHeaderOptions and stlHeaders. We can concatenate as many arrays as we want when forming the command to execute, so it's simpler to keep the "list of compiler options" and "list of header files" separate until then.
    • Extract objFilenames from headerUnitOptions. Similarly, it's simpler to have two lists, one for "compiler options to consume IFCs" and another for "OBJ files to link against". This also eliminates a bit of complexity, where for EDG configurations (not yet active), we had to avoid appending the OBJ files. Now, we merely refrain from concatenating the objFilenames for TestType.COMPILE (which uses /c), as they are needed for TestType.RUN only.
    • Simplify headerUnitOptions appending. It's slightly easier to read as a single line.
    • We don't need absolute paths for object files. /Fo is emitting them into the "current directory" (which is per-configuration), and we can consume them with just their filenames. (Observe that the same is happening for IFC files.)
      • This works only when we're running commands through TestStep. If we directly open files, we'll need absolute paths - that will happen for header-units.json and the .json files emitted by /scanDependencies.
    • Add stl_header_units.lib. I added this because it's very quick to produce, simplifies/shortens the final command line, and presumably is a more realistic example of what users will want to do.
    • Extract getImportableCxxLibraryHeaders(). This reduces the bulk of the code, making it easier to read. I was especially concerned about confusion between the importable C++ library headers (which the independent scenario tests), and the "almost everything in stl/inc" scenario that the deduplicated scenario tests.
    • Rename to consumeBuiltHeaderUnits and hdr. I use these names in the new code.
    • Simplify compileTestCppWithEdg. I don't believe that this can ever appear in test.flags (if I'm wrong, we can easily put it back).
    • Implement topo sort. This is extensively commented so I'll avoid repeating that here.
      • Adding 'version', 'yvals.h', 'yvals_core.h' is extremely important (well, at least the last two), because they're included by everything and avoid ~370 KB of duplication per IFC. They're commented out in header-units.json due to a dependency scanning scenario that doesn't affect our work here.
  • print_failures.py
    • This is a partially related cleanup to use with. According to the Python docs, "It is good practice to use the with keyword when dealing with file objects." While working on the topo sort changes, I learned how Python needs some RAII here.
  • custom_format.py, custombuild.pl
    • Sort memory before memory_resource and string before string_view, following lexicographic order and the Standard table.
  • tests/utils/stl/util.py
  • __msvc_int128.hpp (introduced in Move <ranges> and <format> into C++20 #2518 recently)
    • Remove unnecessary forward declarations of numeric_limits and common_type. This is a perma-workaround for VSO-1475786 "Standard Library Header Units: Deduplication emits bogus errors related to numeric_limits". They're unnecessary because this header includes <bit> which includes <limits>, and it includes <concepts> which includes <type_traits>.

@StephanTLavavej StephanTLavavej added bug Something isn't working test Related to test code labels Feb 3, 2022
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner February 3, 2022 10:31
@StephanTLavavej

This comment was marked as resolved.

Copy link
Member

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicky formatting issues, probably none worth resetting testing.

@StephanTLavavej

This comment was marked as off-topic.

@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej removed their assignment Feb 10, 2022
@barcharcraz
Copy link
Member

it pains me that /scanDependencies isn't in makefile format. That would be a fun little project tbh.

@cdacamar
Copy link
Contributor

it pains me that /scanDependencies isn't in makefile format. That would be a fun little project tbh.

I don't think I understand. /scanDependencies doesn't provide you with a build order, it just reports what direct modules and header units a translation unit depends on.

Copy link
Member

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for these early tests, but I really don't love the custom json comment handling.

@@ -112,7 +112,7 @@
"typeinfo",
"unordered_map",
"unordered_set",
"use_ansi.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing, but it might be better to skip these in the python code instead of commenting them out here, as comments are not valid in json (but are in json5, we could just say this is json5 and parsers need to support comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for these early tests, but I really don't love the custom json comment handling.

Yeah, it's definitely a hack and I wish there were a built-in way to ignore comments.

This is pre-existing, but it might be better to skip these in the python code instead of commenting them out here

The Python code could maintain a separate skip list. However, (with the exception of version, yvals.h, and yvals_core.h) the headers that are commented out in header-units.json need to be skipped by all users, not just the test harness, because those headers are variously incompatible with being treated as header units.

as comments are not valid in json (but are in json5, we could just say this is json5 and parsers need to support comments.

By agreement with the compiler and build system teams, the format of header-units.json is the unofficial "JSON with comments" extension (I am not familiar with json5 but if it's valid in that format, that's good too - I assume that json5 is a superset so it will continue to remain valid).

Of course we could simply omit the ineligible headers, but then it would be difficult to see why they were missing (without a separate file explaining so), which is why I asked for comments to be allowed.


# We want to build everything that's mentioned in header-units.json, plus all of the
# headers that were commented out for providing macros that control header inclusion.
return sorted(set(buildAsHeaderUnits + ['version', 'yvals.h', 'yvals_core.h']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're already doing custom per-header stuff here, so again, we could keep the list of stuff to skip here (or in a separate json dict key) instead of using comments.

Comment on lines +12 to +13
with open(sys.argv[1]) as file:
test_log = json.load(file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a shortcut for this, but I forget what it is :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remember later, I can go back and simplify this! 😸

@StephanTLavavej
Copy link
Member Author

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit bb0cdf6 into microsoft:main Feb 12, 2022
@StephanTLavavej StephanTLavavej deleted the topo_sort branch February 12, 2022 01:55
@CaseyCarter
Copy link
Member

Thanks for cleaning up all of these duplicated header units duplicated header units!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Related to test code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants