Fix and test deduplicated header units #2516

StephanTLavavej · 2022-02-03T10:31:46Z

So far, we've been testing independent header units. In that scenario, <list>, <type_traits>, and <vector> can be built simultaneously, but their IFC files contain duplicate machinery. This consumes more space (to store it), more time (to read it), and asks the compiler to do more work (to discard the duplicate machinery, which is not a trivial task).

With the new /scanDependencies compiler option (implemented by Javier Matos Denizac during his compiler front-end internship; it supersedes the older /sourceDependencies:directives option), we can build deduplicated header units. After scanning the STL to build up a graph of header dependencies, we can build them in topologically sorted order. For example, we'll build <type_traits> early, then import that header unit while building <list> and <vector>. The scan phase is extremely fast (it can be done in parallel, in a single step). The build phase is slower and more complicated - this Python implementation builds topologically sorted "layers" in parallel, but each layer must be complete before the next layer can be built. (In theory, with a real build system like CMake/Ninja, more parallelism could be extracted, although I doubt there are significant gains.) The resulting IFCs are just as easy to consume (with a bunch of compiler options associating each header file to its IFC file, and a single LIB file containing codegen emitted along the way). The IFCs will also contain minimal duplicate machinery - the only potential source of such machinery is when multiple stl/inc headers directly include non-stl/inc headers. (Fortunately, in the recent-ish past, we developed a codebase convention that product code should always include <cmeow> instead of <meow.h>, which prevents CRT headers from being a significant problem.)

While deduplicated header units are generally less stressful for the compiler to consume, they exercise different codepaths (as they have to import some machinery while exporting more). Therefore, this PR tests both the independent and deduplicated scenarios, and I expect that we'll want to retain that forever-ish.

Currently, this PR adds test coverage to the Python-powered GitHub test harness only, not the Perl-powered internal test harness. In principle there is nothing stopping the code from being ported, just dev time and my lack of familiarity with both Perl and the internal test harness. If we repeatedly encounter compiler bugs that could have been caught before Preview releases, we can investigate extending this test coverage in the future.

header-units.json
- Fix use_ansi.h by commenting it out. Within stl/inc, it's included by only yvals.h, and expects yvals.h to have defined _ITERATOR_DEBUG_LEVEL and yvals_core.h to have defined _STRINGIZE. This is not compatible with how header units work. (They can emit macros, they can consume macros from the command line, and they can consume macros from headers if they have included those headers directly or indirectly, but they can't assume that another header has been ambiently included.)
env.lst
- Add /DTEST_TOPO_SORT. (It's listed first so that I could test it quickly with --max-tests=1.) This is an ordinary macro that the product code doesn't care about (hence not _Ugly), but that the Python script looks for, and that the test.cpp can activate workarounds for. Because it isn't special to the Perl script, we'll end up running the coverage twice, which is fine. 🐶 ☕ 🔥
test.cpp
- Add workarounds for VSO-1471374 (fatal error C1116) and VSO-1471382 (error C2672), compiler bugs that I've reported to our compiler front-end wizard @cdacamar. 🪄
custom_format.py
- This may be easier to read as "totally overhauled file that does both old and new scenarios", but I have commits that are structured as a series of cleanups, followed by the new additions.
- Use formatted string literals. I think that this makes the results slightly easier to visualize.
- Separate exportHeaderOptions and stlHeaders. We can concatenate as many arrays as we want when forming the command to execute, so it's simpler to keep the "list of compiler options" and "list of header files" separate until then.
- Extract objFilenames from headerUnitOptions. Similarly, it's simpler to have two lists, one for "compiler options to consume IFCs" and another for "OBJ files to link against". This also eliminates a bit of complexity, where for EDG configurations (not yet active), we had to avoid appending the OBJ files. Now, we merely refrain from concatenating the objFilenames for TestType.COMPILE (which uses /c), as they are needed for TestType.RUN only.
- Simplify headerUnitOptions appending. It's slightly easier to read as a single line.
- We don't need absolute paths for object files. /Fo is emitting them into the "current directory" (which is per-configuration), and we can consume them with just their filenames. (Observe that the same is happening for IFC files.)
  - This works only when we're running commands through TestStep. If we directly open files, we'll need absolute paths - that will happen for header-units.json and the .json files emitted by /scanDependencies.
- Add stl_header_units.lib. I added this because it's very quick to produce, simplifies/shortens the final command line, and presumably is a more realistic example of what users will want to do.
- Extract getImportableCxxLibraryHeaders(). This reduces the bulk of the code, making it easier to read. I was especially concerned about confusion between the importable C++ library headers (which the independent scenario tests), and the "almost everything in stl/inc" scenario that the deduplicated scenario tests.
- Rename to consumeBuiltHeaderUnits and hdr. I use these names in the new code.
- Simplify compileTestCppWithEdg. I don't believe that this can ever appear in test.flags (if I'm wrong, we can easily put it back).
- Implement topo sort. This is extensively commented so I'll avoid repeating that here.
  - Adding 'version', 'yvals.h', 'yvals_core.h' is extremely important (well, at least the last two), because they're included by everything and avoid ~370 KB of duplication per IFC. They're commented out in header-units.json due to a dependency scanning scenario that doesn't affect our work here.
print_failures.py
- This is a partially related cleanup to use with. According to the Python docs, "It is good practice to use the with keyword when dealing with file objects." While working on the topo sort changes, I learned how Python needs some RAII here.
custom_format.py, custombuild.pl
- Sort memory before memory_resource and string before string_view, following lexicographic order and the Standard table.
tests/utils/stl/util.py
- Replace CRLFs with LFs to improve test output as displayed in Azure Pipelines.
- This is an unrelated cleanup, but I wanted it while investigating failures here. (Test logs from this test specifically are also affected by Azure Pipelines: Investigate uploading test logs as artifacts #2557.)
__msvc_int128.hpp (introduced in Move <ranges> and <format> into C++20 #2518 recently)
- Remove unnecessary forward declarations of numeric_limits and common_type. This is a perma-workaround for VSO-1475786 "Standard Library Header Units: Deduplication emits bogus errors related to numeric_limits". They're unnecessary because this header includes <bit> which includes <limits>, and it includes <concepts> which includes <type_traits>.

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files "It is good practice to use the with keyword when dealing with file objects."

CaseyCarter

Nitpicky formatting issues, probably none worth resetting testing.

tests/std/tests/P1502R1_standard_library_header_units/custom_format.py

(Tested in Azure Pipelines.)

barcharcraz · 2022-02-10T07:57:30Z

it pains me that /scanDependencies isn't in makefile format. That would be a fun little project tbh.

cdacamar · 2022-02-10T08:12:37Z

it pains me that /scanDependencies isn't in makefile format. That would be a fun little project tbh.

I don't think I understand. /scanDependencies doesn't provide you with a build order, it just reports what direct modules and header units a translation unit depends on.

barcharcraz

Fine for these early tests, but I really don't love the custom json comment handling.

barcharcraz · 2022-02-10T08:06:15Z

stl/inc/header-units.json

@@ -112,7 +112,7 @@
        "typeinfo",
        "unordered_map",
        "unordered_set",
-        "use_ansi.h",


This is pre-existing, but it might be better to skip these in the python code instead of commenting them out here, as comments are not valid in json (but are in json5, we could just say this is json5 and parsers need to support comments.

Fine for these early tests, but I really don't love the custom json comment handling.

Yeah, it's definitely a hack and I wish there were a built-in way to ignore comments.

This is pre-existing, but it might be better to skip these in the python code instead of commenting them out here

The Python code could maintain a separate skip list. However, (with the exception of version, yvals.h, and yvals_core.h) the headers that are commented out in header-units.json need to be skipped by all users, not just the test harness, because those headers are variously incompatible with being treated as header units.

as comments are not valid in json (but are in json5, we could just say this is json5 and parsers need to support comments.

By agreement with the compiler and build system teams, the format of header-units.json is the unofficial "JSON with comments" extension (I am not familiar with json5 but if it's valid in that format, that's good too - I assume that json5 is a superset so it will continue to remain valid).

Of course we could simply omit the ineligible headers, but then it would be difficult to see why they were missing (without a separate file explaining so), which is why I asked for comments to be allowed.

barcharcraz · 2022-02-10T08:10:43Z

tests/std/tests/P1502R1_standard_library_header_units/custom_format.py

+
+    # We want to build everything that's mentioned in header-units.json, plus all of the
+    # headers that were commented out for providing macros that control header inclusion.
+    return sorted(set(buildAsHeaderUnits + ['version', 'yvals.h', 'yvals_core.h']))


we're already doing custom per-header stuff here, so again, we could keep the list of stuff to skip here (or in a separate json dict key) instead of using comments.

barcharcraz · 2022-02-10T08:15:29Z

tools/scripts/print_failures.py

+with open(sys.argv[1]) as file:
+    test_log = json.load(file)


I think there's a shortcut for this, but I forget what it is :(

If you remember later, I can go back and simplify this! 😸

StephanTLavavej · 2022-02-11T13:26:20Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

CaseyCarter · 2022-02-12T03:19:20Z

Thanks for cleaning up all of these duplicated header units duplicated header units!

StephanTLavavej added 15 commits January 31, 2022 18:52

print_failures.py: Use with.

01bd46c

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files "It is good practice to use the with keyword when dealing with file objects."

env.lst: Add /DTEST_TOPO_SORT.

1abeb69

test.cpp: Work around VSO-1471374 (fatal error C1116).

dd62d35

test.cpp: Work around VSO-1471382 (error C2672).

ea94d7b

custom_format.py: Use formatted string literals.

b0cbbae

custom_format.py: Separate exportHeaderOptions and stlHeaders.

4280557

custom_format.py: Extract objFilenames from headerUnitOptions.

fc420a7

custom_format.py: Simplify headerUnitOptions appending.

a68e2b1

custom_format.py: We don't need absolute paths for object files.

737e1b6

custom_format.py: Add stl_header_units.lib.

5fe58e8

custom_format.py: Extract getImportableCxxLibraryHeaders().

a425bfc

custom_format.py: Rename to consumeBuiltHeaderUnits, hdr.

fc29126

custom_format.py: Simplify compileTestCppWithEdg.

d1ff9f2

custom_format.py: Implement topo sort.

b25c5d5

header-units.json: Fix use_ansi.h by commenting out.

8630f32

StephanTLavavej added bug Something isn't working test Related to test code labels Feb 3, 2022

StephanTLavavej requested a review from a team as a code owner February 3, 2022 10:31

This comment was marked as resolved.

Sign in to view

CaseyCarter approved these changes Feb 5, 2022

View reviewed changes

This comment was marked as off-topic.

Sign in to view

CaseyCarter approved these changes Feb 8, 2022

View reviewed changes

This comment was marked as resolved.

Sign in to view

CaseyCarter approved these changes Feb 9, 2022

View reviewed changes

StephanTLavavej assigned StephanTLavavej and barcharcraz Feb 9, 2022

Merge branch 'main' into topo_sort

1025c7f

StephanTLavavej added 3 commits February 9, 2022 18:31

Sort cat before catapult.

09a7d08

tests/utils/stl/util.py: Replace CRLFs with LFs.

d946b4f

(Tested in Azure Pipelines.)

__msvc_int128.hpp: Remove unnecessary forward declarations.

bd5ad55

StephanTLavavej force-pushed the topo_sort branch from 020353d to bd5ad55 Compare February 10, 2022 02:58

CaseyCarter approved these changes Feb 10, 2022

View reviewed changes

StephanTLavavej removed their assignment Feb 10, 2022

barcharcraz approved these changes Feb 10, 2022

View reviewed changes

StephanTLavavej unassigned barcharcraz Feb 10, 2022

StephanTLavavej self-assigned this Feb 11, 2022

StephanTLavavej merged commit bb0cdf6 into microsoft:main Feb 12, 2022

StephanTLavavej deleted the topo_sort branch February 12, 2022 01:55

StephanTLavavej mentioned this pull request Feb 12, 2022

Internal test coverage for deduplicated header units #2563

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and test deduplicated header units #2516

Fix and test deduplicated header units #2516

StephanTLavavej commented Feb 3, 2022 •

edited

Loading

This comment was marked as resolved.

CaseyCarter left a comment

This comment was marked as off-topic.

This comment was marked as resolved.

barcharcraz commented Feb 10, 2022

cdacamar commented Feb 10, 2022

barcharcraz left a comment

barcharcraz Feb 10, 2022

StephanTLavavej Feb 10, 2022

barcharcraz Feb 10, 2022

barcharcraz Feb 10, 2022

StephanTLavavej Feb 10, 2022

StephanTLavavej commented Feb 11, 2022

CaseyCarter commented Feb 12, 2022

Fix and test deduplicated header units #2516

Fix and test deduplicated header units #2516

Conversation

StephanTLavavej commented Feb 3, 2022 • edited Loading

This comment was marked as resolved.

CaseyCarter left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as resolved.

barcharcraz commented Feb 10, 2022

cdacamar commented Feb 10, 2022

barcharcraz left a comment

Choose a reason for hiding this comment

barcharcraz Feb 10, 2022

Choose a reason for hiding this comment

StephanTLavavej Feb 10, 2022

Choose a reason for hiding this comment

barcharcraz Feb 10, 2022

Choose a reason for hiding this comment

barcharcraz Feb 10, 2022

Choose a reason for hiding this comment

StephanTLavavej Feb 10, 2022

Choose a reason for hiding this comment

StephanTLavavej commented Feb 11, 2022

CaseyCarter commented Feb 12, 2022

StephanTLavavej commented Feb 3, 2022 •

edited

Loading