-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and test deduplicated header units #2516
Conversation
https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files "It is good practice to use the with keyword when dealing with file objects."
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicky formatting issues, probably none worth resetting testing.
tests/std/tests/P1502R1_standard_library_header_units/custom_format.py
Outdated
Show resolved
Hide resolved
tests/std/tests/P1502R1_standard_library_header_units/custom_format.py
Outdated
Show resolved
Hide resolved
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
020353d
to
bd5ad55
Compare
it pains me that /scanDependencies isn't in makefile format. That would be a fun little project tbh. |
I don't think I understand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for these early tests, but I really don't love the custom json comment handling.
@@ -112,7 +112,7 @@ | |||
"typeinfo", | |||
"unordered_map", | |||
"unordered_set", | |||
"use_ansi.h", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre-existing, but it might be better to skip these in the python code instead of commenting them out here, as comments are not valid in json (but are in json5, we could just say this is json5 and parsers need to support comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for these early tests, but I really don't love the custom json comment handling.
Yeah, it's definitely a hack and I wish there were a built-in way to ignore comments.
This is pre-existing, but it might be better to skip these in the python code instead of commenting them out here
The Python code could maintain a separate skip list. However, (with the exception of version
, yvals.h
, and yvals_core.h
) the headers that are commented out in header-units.json
need to be skipped by all users, not just the test harness, because those headers are variously incompatible with being treated as header units.
as comments are not valid in json (but are in json5, we could just say this is json5 and parsers need to support comments.
By agreement with the compiler and build system teams, the format of header-units.json
is the unofficial "JSON with comments" extension (I am not familiar with json5 but if it's valid in that format, that's good too - I assume that json5 is a superset so it will continue to remain valid).
Of course we could simply omit the ineligible headers, but then it would be difficult to see why they were missing (without a separate file explaining so), which is why I asked for comments to be allowed.
|
||
# We want to build everything that's mentioned in header-units.json, plus all of the | ||
# headers that were commented out for providing macros that control header inclusion. | ||
return sorted(set(buildAsHeaderUnits + ['version', 'yvals.h', 'yvals_core.h'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're already doing custom per-header stuff here, so again, we could keep the list of stuff to skip here (or in a separate json dict key) instead of using comments.
with open(sys.argv[1]) as file: | ||
test_log = json.load(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a shortcut for this, but I forget what it is :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remember later, I can go back and simplify this! 😸
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for cleaning up all of these duplicated header units duplicated header units! |
So far, we've been testing independent header units. In that scenario,
<list>
,<type_traits>
, and<vector>
can be built simultaneously, but their IFC files contain duplicate machinery. This consumes more space (to store it), more time (to read it), and asks the compiler to do more work (to discard the duplicate machinery, which is not a trivial task).With the new
/scanDependencies
compiler option (implemented by Javier Matos Denizac during his compiler front-end internship; it supersedes the older/sourceDependencies:directives
option), we can build deduplicated header units. After scanning the STL to build up a graph of header dependencies, we can build them in topologically sorted order. For example, we'll build<type_traits>
early, then import that header unit while building<list>
and<vector>
. The scan phase is extremely fast (it can be done in parallel, in a single step). The build phase is slower and more complicated - this Python implementation builds topologically sorted "layers" in parallel, but each layer must be complete before the next layer can be built. (In theory, with a real build system like CMake/Ninja, more parallelism could be extracted, although I doubt there are significant gains.) The resulting IFCs are just as easy to consume (with a bunch of compiler options associating each header file to its IFC file, and a single LIB file containing codegen emitted along the way). The IFCs will also contain minimal duplicate machinery - the only potential source of such machinery is when multiplestl/inc
headers directly include non-stl/inc
headers. (Fortunately, in the recent-ish past, we developed a codebase convention that product code should always include<cmeow>
instead of<meow.h>
, which prevents CRT headers from being a significant problem.)While deduplicated header units are generally less stressful for the compiler to consume, they exercise different codepaths (as they have to import some machinery while exporting more). Therefore, this PR tests both the independent and deduplicated scenarios, and I expect that we'll want to retain that forever-ish.
Currently, this PR adds test coverage to the Python-powered GitHub test harness only, not the Perl-powered internal test harness. In principle there is nothing stopping the code from being ported, just dev time and my lack of familiarity with both Perl and the internal test harness. If we repeatedly encounter compiler bugs that could have been caught before Preview releases, we can investigate extending this test coverage in the future.
header-units.json
use_ansi.h
by commenting it out. Withinstl/inc
, it's included by onlyyvals.h
, and expectsyvals.h
to have defined_ITERATOR_DEBUG_LEVEL
andyvals_core.h
to have defined_STRINGIZE
. This is not compatible with how header units work. (They can emit macros, they can consume macros from the command line, and they can consume macros from headers if they have included those headers directly or indirectly, but they can't assume that another header has been ambiently included.)env.lst
/DTEST_TOPO_SORT
. (It's listed first so that I could test it quickly with--max-tests=1
.) This is an ordinary macro that the product code doesn't care about (hence not_Ugly
), but that the Python script looks for, and that thetest.cpp
can activate workarounds for. Because it isn't special to the Perl script, we'll end up running the coverage twice, which is fine. 🐶 ☕ 🔥test.cpp
custom_format.py
exportHeaderOptions
andstlHeaders
. We can concatenate as many arrays as we want when forming the command to execute, so it's simpler to keep the "list of compiler options" and "list of header files" separate until then.objFilenames
fromheaderUnitOptions
. Similarly, it's simpler to have two lists, one for "compiler options to consume IFCs" and another for "OBJ files to link against". This also eliminates a bit of complexity, where for EDG configurations (not yet active), we had to avoid appending the OBJ files. Now, we merely refrain from concatenating theobjFilenames
forTestType.COMPILE
(which uses/c
), as they are needed forTestType.RUN
only.headerUnitOptions
appending. It's slightly easier to read as a single line./Fo
is emitting them into the "current directory" (which is per-configuration), and we can consume them with just their filenames. (Observe that the same is happening for IFC files.)TestStep
. If we directly open files, we'll need absolute paths - that will happen forheader-units.json
and the.json
files emitted by/scanDependencies
.stl_header_units.lib
. I added this because it's very quick to produce, simplifies/shortens the final command line, and presumably is a more realistic example of what users will want to do.getImportableCxxLibraryHeaders()
. This reduces the bulk of the code, making it easier to read. I was especially concerned about confusion between the importable C++ library headers (which the independent scenario tests), and the "almost everything instl/inc
" scenario that the deduplicated scenario tests.consumeBuiltHeaderUnits
andhdr
. I use these names in the new code.compileTestCppWithEdg
. I don't believe that this can ever appear intest.flags
(if I'm wrong, we can easily put it back).'version', 'yvals.h', 'yvals_core.h'
is extremely important (well, at least the last two), because they're included by everything and avoid ~370 KB of duplication per IFC. They're commented out inheader-units.json
due to a dependency scanning scenario that doesn't affect our work here.print_failures.py
with
. According to the Python docs, "It is good practice to use thewith
keyword when dealing with file objects." While working on the topo sort changes, I learned how Python needs some RAII here.custom_format.py
,custombuild.pl
memory
beforememory_resource
andstring
beforestring_view
, following lexicographic order and the Standard table.tests/utils/stl/util.py
__msvc_int128.hpp
(introduced in Move<ranges>
and<format>
into C++20 #2518 recently)numeric_limits
andcommon_type
. This is a perma-workaround for VSO-1475786 "Standard Library Header Units: Deduplication emits bogus errors related tonumeric_limits
". They're unnecessary because this header includes<bit>
which includes<limits>
, and it includes<concepts>
which includes<type_traits>
.