Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] Per-header header modules #86193

Closed
vgvassilev opened this issue Mar 21, 2024 · 14 comments · Fixed by #107638 or #110501
Closed

[libc++] Per-header header modules #86193

vgvassilev opened this issue Mar 21, 2024 · 14 comments · Fixed by #107638 or #110501
Assignees
Labels
clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@vgvassilev
Copy link
Contributor

Commit 571178a introduces per header file module with a main motivation to enable detection of cyclic includes. This means that now, instead of having a single binary artifact std.pcm or similar we get hundreds of pcm files for the same content. That's problematic for us due to several reasons:

  • Makes modules redistribution harder -- we carry build artifacts across nodes and it makes it difficult to enumerate all stl-related modules including private ones.
  • Efficiency -- we care about binary size and in-memory footprint. Going for a many pcm files has an overhead. Consider we have several textual files (such as cassert). The textual content becomes duplicated in every pcm file and in all dependent pcm files too.

If the motivation is only to be able to diagnose cyclic includes we can do that as part of the build process of libc++. For example, echo "#include <libcxx/header>" | clang -xc++ -fsyntax-only - will be a sufficient check ensuring header consistency. Alternatively, we could have a "debug" modulemap with that structure.

If there is more motivation to the one I understood, I'd like to suggest having a separate modulemap, called std.modulemap which keeps the headers into a single pcm file and clients like us can use it instead. Note that the module std [system] { header "__std_clang_module" export * } does not work for us because it triggers building of all gazillion pcms.

@ian-twilightcoder, @ldionne, depending on the direction we get in this issue, I can work on creating the std.modulemap and opening a pull request.

cc: @hahnjo, @dpiparo, @ChuanqiXu9.

@vgvassilev vgvassilev added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. clang:modules C++20 modules and Clang Header Modules labels Mar 21, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 21, 2024

@llvm/issue-subscribers-clang-modules

Author: Vassil Vassilev (vgvassilev)

Commit 571178a introduces per header file module with a main motivation to enable detection of cyclic includes. This means that now, instead of having a single binary artifact `std.pcm` or similar we get hundreds of pcm files for the same content. That's problematic for us due to several reasons: * Makes modules redistribution harder -- we carry build artifacts across nodes and it makes it difficult to enumerate all stl-related modules including private ones. * Efficiency -- we care about binary size and in-memory footprint. Going for a many pcm files has an overhead. Consider we have several textual files (such as cassert). The textual content becomes duplicated in every `pcm` file and in all dependent `pcm` files too.

If the motivation is only to be able to diagnose cyclic includes we can do that as part of the build process of libc++. For example, echo "#include &lt;libcxx/header&gt;" | clang -xc++ -fsyntax-only - will be a sufficient check ensuring header consistency. Alternatively, we could have a "debug" modulemap with that structure.

If there is more motivation to the one I understood, I'd like to suggest having a separate modulemap, called std.modulemap which keeps the headers into a single pcm file and clients like us can use it instead. Note that the module std [system] { header "__std_clang_module" export * } does not work for us because it triggers building of all gazillion pcms.

@ian-twilightcoder, @ldionne, depending on the direction we get in this issue, I can work on creating the std.modulemap and opening a pull request.

cc: @hahnjo, @dpiparo, @ChuanqiXu9.

@EricWF
Copy link
Member

EricWF commented Apr 2, 2024

I agree this is broken. I've had success wrapping all the submodules up in a single top level std module. This prevents the build race conditions I run into with LLVM, and I suspect would resolve your issues as well?

@vgvassilev
Copy link
Contributor Author

I agree this is broken. I've had success wrapping all the submodules up in a single top level std module. This prevents the build race conditions I run into with LLVM, and I suspect would resolve your issues as well?

Do you have already a patch?

@philnik777 philnik777 changed the title [libc++] [cxxmodules] Per-header header modules [libc++] Per-header header modules Apr 18, 2024
@ldionne
Copy link
Member

ldionne commented Jul 9, 2024

My preference here would be to maintain a single modulemap and to simply gradually merge back modules without creating cycles. I suspect it is possible to drive towards a folder organization + modulemap that are consistent, don't introduce cycles and allow having decently sized submodules. If that's feasible, I think that would be the most viable alternative.

@vgvassilev
Copy link
Contributor Author

@ldionne, I believe can detect cycles using different approaches (standalone per-header compilation or using the modulemap submodule infrastructure). Do I understand correctly your comment that we will have again a single module for all of stl?

@ian-twilightcoder
Copy link
Contributor

It's not about include cycles. Each C standard library header has to be in its own top level module to support module layering between libc++, clang, and libc*. Take inttypes.h and stdint.h. Each libc++ header does an #include_next of the libc header. The problem arises when inttypes.h includes stdint.h. That creates two module cycles between the cxx and c modules: cxx:inttypes.h -> c:inttypes.h -> cxx:stdint.h and c:inttypes.h -> cxx:stdint.h -> c:stdint.h. The only way to break those two cycles is to make top level modules for all 4 headers: cxx_inttypes:inttypes.h -> c_inttypes:inttypes.h -> cxx_stdint:stdint.h -> c_stdint:stdint.h.

Each library has no way of knowing what the others' versions of the C standard library headers will include, so all of the C standard library headers in all libraries must be top level modules. In C++, the C standard library headers are quite intertwined with the C++ standard library headers by standard, and so those too need to be top level modules.

libc++ cannot go back to a single module for all of its headers. However, we can greatly reduce the number of modules by grouping the private implementation headers. I had a PR that did that programmatically https://reviews.llvm.org/D155141. That takes the module count from ~1000 to ~200. However, it was poorly received. I tried do something similar by hand, but it was utterly miserable to maintain because the layering is greatly perturbed every time an implementation header is added, or changes what it includes. Trying to break an 8+ module cycle is very difficult, the bad vertex is difficult to identify, and rearranging the modules to eliminate the bad vertex is also very tricky. Louis thinks it all can be done by hand, but if that's true then it requires someone with much more knowledge about how libc++ is implemented, organized, and layered than me.

* if your libc does not have a module map then you can't use clang modules. As far as I know, Apple is the only libc that has module maps. I'll leave it to @Bigcheese to explain the gory details of why a modular header can't include a non-modular header if anyone wants to know.

@vgvassilev
Copy link
Contributor Author

vgvassilev commented Jul 9, 2024

@ian-twilightcoder, thank you for your post and background.

The only way to break those two cycles is to make top level modules for all 4 headers: cxx_inttypes:inttypes.h -> c_inttypes:inttypes.h -> cxx_stdint:stdint.h -> c_stdint:stdint.h.

Most of the times one can get away with that using modules re-exports. It might be a little more tricky to make it work across libraries. However, I am wondering how the single module approach worked until recently.

libc++ cannot go back to a single module for all of its headers.

That was the case before, right? I am wondering what has changed. FWIW, we are able to make libstdc++ having a single module (since version 4.9) and I am pretty sure it was more entangled than libc++ was or will ever be.

By the way, if I edit the libc++ modulemap file and wrap everything into a top-level module I get working single module. I am really confused. I am missing something...

@ldionne
Copy link
Member

ldionne commented Jul 9, 2024

This is what I'd like us to do: #98214
I think we can alleviate 80% of the existing pain with minimal engineering, minimal changes to the code bases and minimal added complexity to the library. I might be wrong, but if I am wrong, then there's something fundamentally wrong with Clang modules because we are not doing anything that fancy in libc++.

Only the parts of the library that can be re-included from C headers should be tricky to implement and will probably require its own top-level modules, but I can't imagine that everything else also requires its own module.

@ian-twilightcoder
Copy link
Contributor

I think you'll find that __algorithm, __chrono, __functional, __memory, __ranges, __tuple, __type_traits, __utility (and others) are too intertwined for a simple approach like that, but maybe you can do some more sophisticated cleanup so that they layer better and aren't a continuous source of module cycles as their headers change.

@ldionne
Copy link
Member

ldionne commented Jul 9, 2024

That's the idea. I expect that only a few things in __algorithm, __chrono & friends might cause cycles, and those can (and should) probably extracted into some other module. Maybe they belong in some kind of __core module or whatever. I think we'd find out as we go.

@ian-twilightcoder If we introduce cycles, would we find out in our CI testing? In particular, would we find out about re-introducing cycles with the C stdlib headers in the SDK if we happened to break that, or would we only find out downstream a while later?

@ian-twilightcoder
Copy link
Contributor

Yeah, the cycles will show up in the modules CI tests. As long as you keep the c stdlib headers as their own top level modules, you should remain acyclic with the underlying libc.

@vgvassilev
Copy link
Contributor Author

Yeah, the cycles will show up in the modules CI tests. As long as you keep the c stdlib headers as their own top level modules, you should remain acyclic with the underlying libc.

But that was my point initially. This mode is mostly for making sure things do not regress in layering but that’s not necessarily the right way to ship a release. You can have that mode in the CI without duplicating too much modulemap infrastructure.

@ian-twilightcoder
Copy link
Contributor

Modules have to be acyclic. And when libc is covered by a module, the only way to have acyclic modules is for the C stdlib headers like stddef.h to all have their own top level modules.

@vgvassilev
Copy link
Contributor Author

vgvassilev commented Jul 10, 2024

Modules have to be acyclic. And when libc is covered by a module, the only way to have acyclic modules is for the C stdlib headers like stddef.h to all have their own top level modules.

I respectfully disagree with this statement. I’ve mentioned that one way to deal with entangled headers is re-exporting them from both modules. Could you give an example with the current libc++ codebase which requires top-level modules? We currently overwrite the libc++ modulemap to build a single module and that works without problems.

@ldionne ldionne self-assigned this Sep 10, 2024
ldionne added a commit that referenced this issue Sep 26, 2024
…es (#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes #86193
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this issue Sep 26, 2024
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this issue Sep 27, 2024
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
ldionne added a commit to ldionne/llvm-project that referenced this issue Sep 30, 2024
…es (llvm#107638)

This is a re-application of bc6bd3b which was reverted in f11abac
because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
ldionne added a commit to ldionne/llvm-project that referenced this issue Sep 30, 2024
…es (llvm#107638)

This is a re-application of bc6bd3b which was reverted in f11abac
because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
ldionne added a commit that referenced this issue Sep 30, 2024
…es (#110501)

This is a re-application of bc6bd3b which was reverted in
f11abac because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes #86193
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this issue Oct 2, 2024
…es (llvm#110501)

This is a re-application of bc6bd3b which was reverted in
f11abac because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this issue Oct 2, 2024
…es (llvm#110501)

This is a re-application of bc6bd3b which was reverted in
f11abac because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
xgupta pushed a commit to xgupta/llvm-project that referenced this issue Oct 4, 2024
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
xgupta pushed a commit to xgupta/llvm-project that referenced this issue Oct 4, 2024
…es (llvm#110501)

This is a re-application of bc6bd3b which was reverted in
f11abac because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
5 participants