Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: how to generate C++ headers from cppfront? #594

Open
vladimir-kraus opened this issue Aug 16, 2023 · 20 comments
Open

Question: how to generate C++ headers from cppfront? #594

vladimir-kraus opened this issue Aug 16, 2023 · 20 comments

Comments

@vladimir-kraus
Copy link

vladimir-kraus commented Aug 16, 2023

Please correct me if I am mistaken or let me know whether I am lagging behind the status-quo of cppfront design.

I believe that the success of any new "C++ successor" language will be determined not only how easy and safe it is to write new code but also by how easy it will be gradually translate existing code from C++ to cppfront. The simplest approach of transforming existing code would be to rewrite classes (typically with declaration in *.h and implementation in *.cpp) one by one from C++ to cppfront. This would exactly represent the way of language adoption as expressed by smooth ramp-up line as presented by Herb at the latest CPP conference.

I have read that cppfront is probably aiming to ditch the idea of C++ headers altogether and rely only on modules. I am no expert i modules (so maybe there is some magical solution...) but I think that ditching C++ headers completely may harm the process of cppfront gradual adoption. Simply because existing C++ code expects to include class headers. Majority of existing code is written without modules in mind.

So in my opinion, cppfront should have a way to generate headers alongside with *.cpp files. I do not know how this can be done now. I learned that there i something like *.h2 files. So I experimented with this a bit. I took a very naive (and wrong) approach...

// file widget.h2

Widget: type = {
    x : int = 0;
    y : int = 0;
    sum : (this) -> int;  // I naively attempted something like a method declaration here. It does not compile. I know it is wrong.
}
// file widget.cpp2

#include "widget.h2"

// I am naively trying something like a method definition. It does not compile. I know it is wrong.
sum : (this : Widget) -> int = {
    return x + y;
}
// file main.cpp2

#include "widget.h2"

main: (args) = {
    w := Widget();
    std::cout << w.sum();
}

I know this is wrong approach but given the documentation does not exist, I did not discover any other solution which would do the same and would work. I would expect that cppfront would generate widget.h, widget.cpp and main.cpp which would be exactly what a programmer would write in hand with C++, but with cppfront it will be much safer and more concise. The benefit would be that other existing C++ code could also include the generated widget.h if it needs it.

Another possible alternative to the *.h2 files above would be writing just widget.cpp2 which would contain class declarations and implementations at one place but with some magical @... directive added to the class would cause that a *.h file would be generated alongside with *.cpp. The header would contain class definition and method declarations, and cpp file would contain method implementations.

However as I wrote above, maybe some other and better solution already exists in cppfront and I am not aware of it. In that case I would love to learn about how to solve the issue above.

Addendum:

Myself being a Qt-framework fanboy I would love to see adoption of cppfront also within Qt community. By this I do not mean that Qt framework itself would be rewriten to cppfront, this will probably not happen. But I would love to see Qt applcaitions to be written in cppfront.

But the problem is that Qt has it's very special ways... It heavily relies on MOC compiler, which is basically a code generator which parses headers files and based on some macros such as Q_OBJECT it generates additional code necessary for the framework to work. In order to be able to allow interoparability between cppfront and Qt the following 3 steps would need to take place:

  1. *.cpp2 (and *.h2) files are processed by cppfront. They generate *.cpp and *.h files.
  2. MOC compiler processes all *.h files in the project and where necessary it generates additional *.cpp files with some Qt "magic".
  3. All *.cpp and *.h files (i.e. all those generated by cppfront, generated by MOC and handwritten) are compiled and linked together.

So in order for this to work, it is essential in step 1 to be able to generate somehow the header files so that they can be processed by MOC in step 2...

@JohelEGP
Copy link
Contributor

I have read that cppfront is probably aiming to ditch the idea of C++ headers altogether and rely only on modules.

I don't think so. From the README:

  • double down on modern C++ (e.g., make C++20 modules and C++23 import std; the default);
    sum : (this) -> int;  // I naively attempted something like a method declaration here. It does not > compile. I know it is wrong.

Cpp2 doesn't have separate declaration and definition.

MOC compiler

This is supposed to be taken care of metafunctions, or reflection in general.

@vladimir-kraus
Copy link
Author

vladimir-kraus commented Aug 16, 2023

I reacted to this thread #120 where I read about not having headers in cppfront and relying on modules. Maybe I misunderstood.

However this is what concerns me a bit. I certainly understand the ultimate goal of having a new and much better language (and I believe that cppfront has such high potential). But I still think that practicality and easiness of gradual adoption is what can make or break the language. And by gradual adoption, I mean not only learning the language by programmers for using it in new projects, but also ability to gradually piece-by-piece rework existing large codebases and elevate them to higher ground. This is what I believe was expressed by the smooth ramp-up line in Herb's presentation.

Any codebase of any C++ project nowadays is basically a large heap of *.h and *.cpp files. Typically each class has one *.h and one *.cpp file. Gradual adoption and transition to cppfront in such a project would be to take one class at a time and rewrite it to cppfront. Without changing any files other than the header and the cpp related to this class. (AFAIK, Kotlin allows such simple transition from Java and it is exactly why Kotlin succeeded) If cppfront does not allow generating C++ headers I do not think whether this will be possible. Or is there any other, comparably easy way?

And even if generating C++ headers would not be the best-practice and the one recommended way of working with cppfront, I think they should exist as some backdoor possibility just for the existing code migration. And yes, as a side effect, they would allow win over the large Qt community too because they would allow using existing MOC tools.

@JohelEGP
Copy link
Contributor

Commit 347c1c2 came after that.
So maybe the situation has changed since then.

@vladimir-kraus
Copy link
Author

Oh, I see. Thank you for pointing me to this commit. Splitting to *h and *.hpp sounds like a viable solution to my concerns. But I am still doing something wrong here and it does not work for me.

// file widget.h2

Widget: type = {
    x : int = 0;
    y : int = 0;

    sum : (this) -> int = {
      return x + y;
    }
}

and I run ./cppfront widget.h2 (macOS 13.4, clang) and it does not create any widget.hpp as I would expect from the commit description. It creates only widget.h file which contains also implementations. So including it multiple times leads to multiple definition linking error. Below is the generated *.h file, please note the definitions at the end.


#ifndef WIDGET_H__CPP2
#define WIDGET_H__CPP2


//=== Cpp2 type declarations ====================================================


#include "cpp2util.h"

#line 1 "widget.h2"
class Widget;
    

//=== Cpp2 type definitions and function declarations ===========================

#line 1 "widget.h2"
class Widget {
    private: int x {0}; 
    private: int y {0}; 

    public: [[nodiscard]] auto sum() const -> int;
      
    public: Widget() = default;
    public: Widget(Widget const&) = delete; /* No 'that' constructor, suppress copy */
    public: auto operator=(Widget const&) -> void = delete;


#line 8 "widget.h2"
};


//=== Cpp2 function definitions =================================================


#line 5 "widget.h2"
    [[nodiscard]] auto Widget::sum() const -> int{
      return x + y; 
    }
#endif

So is this my mistake? Am I doing it wrong? Or is this a cppfront bug?

@JohelEGP
Copy link
Contributor

I think it's just that a latter commit made it so there's only .h.

@vladimir-kraus
Copy link
Author

vladimir-kraus commented Aug 16, 2023

I dived into cppfront source code... and I found that to produce hpp with definitions I need to use -pure-cpp2 switch. Well, this is probably a bit concerning again because such a *.h2 file cannot for example include other C++ (i.e. cpp1) headers... because it would not compile with this -pure-cpp2 flag. This makes the gradual transition very cumbersome because I could only start translating one by one those classes which do not depend on other C++ classes. This is rather limiting...

@vladimir-kraus
Copy link
Author

vladimir-kraus commented Aug 16, 2023

I have an idea... Wouldn't it be possible to introduce a compiler switch (i.e. not default behavior, but opt-in) that would cause cppfront to work like this: when processing xyz.cpp2 file, it would split declarations and definitions. Declarations would go to xyz.h and definitions to xyz.cpp? Of course the xyz.cpp would include xyz.h at the very top. This way we could have proper C++ headers and including them to any other (cpp2 or cpp1) code would not break one-definition-rule.

But it seems to me as too simple to believe you have not already considered this. So I guess I have overlooked something important that makes this impossible...

@JohelEGP
Copy link
Contributor

Ah, you're right about -clean-cpp1.
If you don't use that flag,
you have a single .h generated,
which you can include from Cpp1 code.

@vladimir-kraus
Copy link
Author

vladimir-kraus commented Aug 17, 2023

Well, yes and no.

Because it contains also definitions, you can include it only from one cpp1 file. Otherwise you will break one-definition-rule and the application will refuse to link. And that is a very hard limitation to usability of such a header.

@JohelEGP
Copy link
Contributor

You're right.
It seems that we're specifically violating

14
#
For any definable item D with definitions in multiple translation units,

(14.1)
if D is a non-inline non-templated function or variable, or
(14.2)
if the definitions in different translation units do not satisfy the following requirements,

the program is ill-formed; a diagnostic is required only if the definable item is attached to a named module and a prior definition is reachable at the point where a later definition occurs.
Given such an item, for all definitions of D, or, if D is an unnamed enumeration, for all definitions of D that are reachable at any given program point, the following requirements shall be satisfied.

@JohelEGP
Copy link
Contributor

JohelEGP commented Aug 17, 2023

See https://cpp2.godbolt.org/z/jeEobqWq9.
It has this lib.h2:

#define GREET inline greet
lib: namespace = {
  GREET: () -> std::string_view = "Hello, World!\n";
}

And it works because the function in this header is inline.
But if we remove the inline, this is the error:

other.cpp:(.text+0x0): multiple definition of `lib::greet()';

Could it be that functions in headers should be inline by default?

@vladimir-kraus
Copy link
Author

vladimir-kraus commented Aug 17, 2023

But we do not want all functions in headers to be inlined, that would definitely harm application size and would cause other problems. I am afraid this is not the way to go...

I think there should be some other strategy for transition/migration of existing large C++ projects to cppfront.


Let me think out loud.

We will have three stages of migration existing C++ project:

Stage 1 - Pure cpp1 project. It is basically a pile of *.cpp and *h files, with declarations in headers and definitions in cpps. Cpps can #include the headers and headers also can #include other headers. During build, all cpps are compiled and then linked together.

Stage 2 - Mixed cpp1 and cpp2 project. Here the cpp2 (and h2) files are transpiled to cpp1 format. And then it is compiles and links the same as in stage 1. In order for the original cpp1 code to be able to use (#include and link) cpp2 code, we definitely need to be able to produce cpp1 headers from cpp2 files somehow. Modules probably cannot help us because most existing projects do not use modules, they still include traditional headers.

Stage 3 - Pure cpp2 project. Maybe not all projects will achieve this final stage. But successful migration is such that contains only absolute necessary minimum of cpp1 code with the rest being in cpp2.


Stage 2 is actually where the migration will happen. It has to be done gradually in very simple and small steps. At each step, the application will be fully buildable and functional. I believe that a strict requirement for this incremental migration is that you can change individually each single class in the project by rewriting its *.h and *.cpp to *.cpp2 and using it WITHOUT TOUCHING any other code (I mean other that the header and cpp where the class being migrated is defined and implemented). And this requirement "WITHOUT TOUCHING" strictly requires that we must be able to produce the header so that it can be #included from other files. And of course we need to be able to produce also a cpp file with the definitions. The definitions cannot be in the header because it would break one definition rule.

How to achieve this? Let's now forget about h2 files because I think they cannot help us here. They are probably only useful in the final stage 3, in pure cpp2 code.

So lets assume we have one class in one cpp2 file. We need to be able to generate the *.cpp with function definitions and *.h with declarations (side note: I am considering a class definition as a declaration here since ODR does not apply to it...). We basically have two tools:

a) cppfront compiler switches/flags
b) some cpp2 in-code directives

Using these two we should be able to transform each *.cpp2 file into one *.h and one *.cpp, in a similar form to what a human programmer would write, only with cpp2 it will be much safer code.

The compiler switches should be able to instruct the compiler that it should produce the header and cpp. It should also be able to define how strict the compiler should be when checking the code in mixed cpp2 file. For example, when it encounters some non-conforming line, whether it should scream error or whether it should just ignore it and flush it to the generated *.h or *.cpp without any error because this may very well be a fully correct cpp1 code. For example Q_OBJECT macros in Qt classes. Currently cppfront screams error for such a line now. But I should be able to tell the compiler to just flush it to the cpp1 header without any questions. So there should be some cppfront switch (maybe called "liberal-mode" or something similar) which just passes any line which it does not recognize as valid cpp2 code directly to the output file.

As the migration would go on and there would me more cpp2 code and less cpp1 code in the mixed files, then these "liberal" switched could be gradually switched off to enforce stricter rules. But this should be on individual basis, file per file.

Unlike compiler switches/flags which operate on per-file basis, in-code directives can work on per-line or per-block basis and inform the cppfront compiler how to treat individual pieces of code. For example whether they should be output to *.h or *.cpp.


Alright, this was just thinking out loud. I do not have any concrete idea or design for these flags or directives in mind. I just believe that what I wrote is a necessary (not sufficient) condition for possible gradual migration of any cpp1 project.

@SebastianTroy
Copy link

SebastianTroy commented Aug 17, 2023 via email

@vladimir-kraus
Copy link
Author

@SebastianTroy I am no expert in modules, so allow me asking a question: is it possible to migrate a large codebase from header-based to module-based also gradually on one-file-at-a-time basis? Or does it require to migrate the whole codebase to modules as a prerequisity to migrating to cppfront?

@SebastianTroy
Copy link

SebastianTroy commented Aug 17, 2023 via email

@JohelEGP
Copy link
Contributor

JohelEGP commented Aug 17, 2023

I think there's a bug you can take advantage of.
Commit 347c1c2 says

  • consumption: any #include "something.h2" is emitted as an #include "something.h" in its original location relative to the Cpp1 source, and an #include "something.hpp" at the beginning of the Cpp2 definitions section

In -pure-cpp2 mode, the "#include "something.hpp"" isn't being emitted.
So, right now, you can compile with Cppfront all your .cpp2 and .h2 source files,
and in a single Cpp1 TU include all .hpp source files generated from .h2 source files.

So this works (https://cpp2.godbolt.org/z/o544qxfGj):
lib.h2:

lib: namespace = {
greet: () -> std::string_view = "Hello, World!\n";
}

main.cpp2:

#include "lib.h2"
main: () -> int = { std::cout << lib::greet(); }

other.cpp:

#include "lib.h"
#include "lib.hpp"
cppfront -p lib.h2
cppfront -p main.cpp2
$CXX main.cpp other.cpp

Running the executable prints:

Hello, World!

Using -pure-cpp2 does mean that the Cpp2 source file can't consume Cpp1 libraries (other than the C++ standard's).

@JohelEGP

This comment was marked as resolved.

@Grinvase
Copy link

Grinvase commented Oct 1, 2023

@vladimir-kraus I ended up modifying cppfront.cpp to output everything before //=== Cpp2 function definitions in a new .h file and append #include at the top of the generated .cpp file.
Example: 7975ec7

Results: Given test.cpp2 (copied from https://github.com/hsutter/cppfront/blob/main/regression-tests/pure2-hello.cpp2):

decorate: (inout s: std::string) = {
    s = "[" + s + "]";
}

Using command cppfront test.cpp2 -split-header-file, produce test.h and test.cpp:
test.h:

#ifndef TEST_CPP_CPP2
#define TEST_CPP_CPP2

//=== Cpp2 type declarations ====================================================

#include "cpp2util.h"

//=== Cpp2 type definitions and function declarations ===========================

auto decorate(std::string& s) -> void;
    
#endif

test.cpp:

#include "test.h"

//=== Cpp2 function definitions =================================================

auto decorate(std::string& s) -> void{
    s = "[" + s + "]";
}

@JohelEGP
Copy link
Contributor

JohelEGP commented Nov 5, 2023

I found the solution.
It's the opening comment's code, slightly modified.

widget.h2:

Widget: type = {
  x : int = 0;
  y : int = 0;
  sum : (this) -> int = {
    return x + y;
  }
}

widget.cpp2:

#include "widget.h2"
#include "widget.hpp"

main.cpp2:

#include "widget.h"

main: (args) = {
  w := Widget();
  std::cout << w.sum();
}

According to commit 347c1c2,
a .h2 generates an .h with the declarations and .hpp with the definitions
(now respectively Phase 1 "Cpp2 type declarations" and Phase 1 "Cpp2 type declarations", and Phase 2 "Cpp2 type definitions and function declarations").
(This actually only happens with -pure-cpp2, otherwise you get it all in the .h.
It also says that the .hpp is #included in the definitions, but it seems that never happens).

If we compile widget.h2 with -pure-cpp2, it'll generate widget.h and widget.hpp.
widget.cpp2 now just includes both generated headers to provide the implementation.
In main.cpp2, we include widget.h to get only the interface, and link to the library with the implementation.

This is the CMake (https://cpp2.godbolt.org/z/TTo1qcq6c):

# See <https://github.com/hsutter/cppfront/issues/594>.
add_library(widget)
set(JEGP_CXX2_FLAGS "-p") # Pure to split implementatio to `.hpp`.
jegp_cpp2_target_sources(widget PRIVATE "widget.h2")
set(JEGP_CXX2_FLAGS "") # Non-pure to accept `#include widget.hpp`.
jegp_cpp2_target_sources(widget PRIVATE "widget.cpp2")

add_executable(main)
set(JEGP_CXX2_FLAGS "")
jegp_cpp2_target_sources(main PRIVATE "main.cpp2")
target_include_directories(main PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
target_link_libraries(main PRIVATE widget) # Link to `widget` for implementation.

I've been warming up with libraries for #797 (reply in thread).

@SavenkovIgor
Copy link

Could anyone provide an update on the current status of this issue? Are there any developments or plans concerning the generation of headers and ensuring interoperability with tools that depend on C++ headers?

From what I understand, currently, it is not possible to have cppfront generate both header and source files without -p[ure-cpp2] mode and without resorting to external patches

JohelEGP referenced this issue Oct 6, 2024
Requiring `=` has been a widely-requested change, and I think it does make the code clearer to read. See the Cpp2 code changes in this commit for examples.

For example, `f: () expr;` must now be written as `f: () = expr;`

For a function whose body is a single expression, the default return type (i.e., if not specified) is now `-> forward _`, which is Cpp1 `-> decltype(auto)` (which, importantly, can deduce a value).

With this change, single-expression function bodies without `{ }` are still legal for any function, but as of this commit we have a clearer distinction in their use (which is reflected in the updates to the regression tests and other Cpp2 code in this commit):

    - It further encourages single-expression function bodies without `{ }` for unnamed function expressions (lambdas), by making more of those cases Do the Right Thing that the programmer intended.

    - It naturally discourages their overuse for named functions, because it will more often cause a compiler warning or error:

       - a warning that callers are not using a returned results, when the function is called without using its value

       - an error that a deduced return type makes the function order-dependent, when the function is called from earlier in the source file... this is because a deduced return types creates a dependency on the body, and it is inherent (not an artifact of either Cpp1 or Cpp2)

       But for those who liked that style, fear not, usually the answer is to just put the two characters `{ }` around the body... see examples in this commit, which I have to admit most are probably more readable even though I'm one of the ones who liked omitting the braces.

       Note: I realize that further evolution may not allow the `{ }`-free style for named functions. That may well be a reasonable further outcome.

I think both of these are positive improvements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants