-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ runtime changes for high warning levels #1902
Conversation
constructor, put the destructor into the .cpp file so the vtable doesn't get generated everywhere.
These changes are for compiling with high warning levels and -Werror. There are no functional changes in this commit. Compiled with gcc 5.4 and clang 3.8. Summary: - Put virtual destructors into the appropriate .cpp file instead of the inline version in the header to avoid many vtables. - Change C-style casts to modern C++ casts. - Add explicit casts in some signed to/from unsigned conversions. - Remove unreached code in BufferedTokenStream.cpp and LexerATNSimulator.cpp. - Remove shadowed variables by qualifying constructor arguments with the name name as a member variable. - Add explicitly defined copy constructors and assignment operators where required by gcc's -Weff-c++. - Use std::numeric_limits<size_t>::max() instead of assigning a negative number. - Remove semi-colons after function definitions. - Remove unneccessary casts. - In preprocessor statements "#if label > value" change to "#if defined(label) && label > value" to avoid warnings about the undefined symbol being seen as zero. - Remove ANTLR4CPP_PUBLIC from "enum class" definitions. - Change the FinalAction move constructor to move instead of copy the _cleanUp std::function object. (A side-effect of explicitly initialising member variables as required by gcc's -Weff-c++. I turned this one off because most constructors needed to be touched, especially the classes implemented with InitializeInstanceFields()). - Mark hex digit conversion functions as file static in guid.cpp.
- Also remove static instance of std::wstring_convert. Access to std::wstring_convert<>::{from,to}_bytes() is not guaranteed to be thread safe.
Just added changes to remove the fix the utf8_to_utf32 auto return type which broke the CI builds. While I was there I removed the static utfConverter instance; access to its {from,to}_bytes methods are not guaranteed thread safe. |
The Travis CI build is failing after an include of <cstddef> -- This is an attempt to work around that by including <stddef.h> instead. Problem not apparent in my FreeBSD environment.
ATN::nextTokens(ATNState* s) updates s->nextTokenWithinRule if the IntervalSet is empty, and then sets it to be read only. However, if the updated IntervalSet value was also empty, it becomes a read-only empty set, causing an exception on a second call on the same state. This was exposed a change I made to make IntervalSet::operator=() respect the _readonly flag. (Which in turn was found by compiling with a high warningly level.) The approach in this update is to perform the update if the updated value is not empty or if the current value is not read only. This preserves the previous behaviour of creating a read-only empty set and working on subsequent calls. It will throw on an attempt to update a read-only value, where previously the read-only value would be silently discarded and set to updatable.
This is a proposed fix to bug antlr#1826 which removes a race condition where multiple threads could update ATNState::nextTokenWithinRule, leading to corrupted std::vector instances in an InstanceSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite a big patch and I haven't commented on all places where a change is needed. Let me rather summarize here and consider changing all places that are affected:
- Consistency: your change from
{}
to= default
is ok, but it should be used consequently. Often you added an empty body instead. These should all be removed and replaced bydefault
. - You added several times an empty d-tor (not even virtual), which makes no sense. It would make sense for virtual d-tors. The other d-tors are just noise.
- You added several times copy c-tor and copy assignment functions only to say they should use the default impl. Why, by all means, did you even add them if you rely on the default impl anyway. That looks so totally superfluous.
- The change from
#if
to#if defined
also seems like useless cosmetic change, or is there anything that requires the latter (I haven't seen any problems or warnings resulting from the original code)? - Code formatting: in the code you added (mostly the new d-tor impls, but also others) you did not consider the coding style in the files -> opening brace at the end of the line, not an own line.
CommonTokenStream::CommonTokenStream(TokenSource *tokenSource, size_t channel) | ||
: BufferedTokenStream(tokenSource), channel(channel) { | ||
CommonTokenStream::CommonTokenStream(TokenSource *tokenSource, size_t channel_in) | ||
: BufferedTokenStream(tokenSource), channel(channel_in) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a number of similar changes already and use a simple trailing underscore. Having names like channel_in
violates the camel case coding style. Can you please change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
@@ -17,7 +17,7 @@ using namespace antlr4; | |||
DiagnosticErrorListener::DiagnosticErrorListener() : DiagnosticErrorListener(true) { | |||
} | |||
|
|||
DiagnosticErrorListener::DiagnosticErrorListener(bool exactOnly) : exactOnly(exactOnly) { | |||
DiagnosticErrorListener::DiagnosticErrorListener(bool exactOnly_in) : exactOnly(exactOnly_in) { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
UnsupportedOperationException::~UnsupportedOperationException() = default; | ||
EmptyStackException::~EmptyStackException() = default; | ||
CancellationException::~CancellationException() = default; | ||
ParseCancellationException::~ParseCancellationException() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make more sense to specify the default behavior directly on the class declarations, instead of having a list like this somewhere. Also it violates the pattern used throughout the runtime:
class A {
public:
<variables>
<c-tors>
<d-tor>
<other decls>
protected:
...
};
That means each d-tor definition has to be placed after the c-tor defintion in the cpp file or in the declaration part. Please change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with putting the "= default" in the header file is that the vtable is not generated as part of the translation unit. I kept the same order as the pattern used in the header file changes, the difference is that the destructor doesn't have an inline definition.
Would you prefer an empty destructor or "= default" in a .cpp file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. In such cases make them empty d-tors in the cpp file - properly ordered with the rest of the code.
IllegalStateException(const std::string &msg = "") : RuntimeException(msg) {}; | ||
IllegalStateException(const std::string &msg = "") : RuntimeException(msg) {} | ||
IllegalStateException(IllegalStateException const&) = default; | ||
~IllegalStateException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~IllegalStateException() = default;
Similar for all other exceptions. However, I wonder why you introduced the d-tor on all the exceptions if you use the default implementation anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the previous comment. To make sure that the vtable for the class with virtual functions is only generated in one translation unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, I never had problems with v-table creation so far. Is this something that changed in a newer compiler? Neither clang nor msc seem to have a problem with that either. Can you point me to an online resource, so I can read up the details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vtables are created OK, and will work. The problem is that there are too many of them.
That has been part of my coding standard for so long, I assumed everyone did it that way. However, finding an online reference that gives this advice has been surprisingly difficult in about half an hour of searching.
The best origin reference I found with a quick look around the office from John Lakos's "Large-Scale C++ Software Design" (from 1996), which has the "minor design rule":
"In every class that declares or is derived from a class that declares a virtual function, explicitly declare the destructor as the first virtual function in the class and define it out of line."
It goes on to describe the reasons, including an anecdote of thousands of instances of the same functions in different translation units caused by duplicated vtables. Locality of reference will probably also suffer (ie. different instances of the same function will lead to less code fitting into the CPU's instruction cache.)
(I read that in 1997 and pretty much adopted all of it, so I have probably been doing this for at least 20 years.)
Item 24 in Scott Meyers' "More Effective C++" also cautions against inline virtual function definitions with the quote "In large systems, this can lead to programs containing hundreds or thousands of copies of a class's vtbl!"
And, of course, Clang has a warning to let you know you're making your object too big:
-Wweak-vtables
Diagnostic text:
warning: A has no out-of-line virtual method definitions; its vtable will be emitted in every translation unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes a lot of sense. Something new learned. Thanks a lot.
@@ -13,3 +13,7 @@ InputMismatchException::InputMismatchException(Parser *recognizer) | |||
: RecognitionException(recognizer, recognizer->getInputStream(), recognizer->getContext(), | |||
recognizer->getCurrentToken()) { | |||
} | |||
|
|||
InputMismatchException::~InputMismatchException() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency: should not be defined empty here but use default
in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the previous two comments. Happy make a choice between empty destructors and "= default"
@@ -30,15 +31,17 @@ namespace misc { | |||
protected: | |||
/// The list of sorted, disjoint intervals. | |||
std::vector<Interval> _intervals; | |||
bool _readonly; | |||
std::atomic<bool> _readonly; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a mutex in the ATN for protecting the read-only state change and make the _readonly flag atomic? That's not necessary - one of the two can be removed. I'd prefer atomic
to stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I comment on this in the bug report. The atomic alone is not sufficient to remove the race condition around updates. This is a pattern we use a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, now I get it. You have that check before the mutex (where atomic comes in) then the mutex lock. So this seems ok then. I'm just a bit nervous about performance impact. The interval set class is one of the lowest level classes and used everywhere in the antlr4 runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that you can check _readonly orders of magnitude more quickly than taking a lock on a mutex. That's pretty fast, and in the scheme of all the other stuff going on (eg. copying lots of IntervalSet) not too bad.
For optimising IntervalSet, adding IntervalSet(IntervalSet&&)
and IntervalSet& operator=(IntervalSet&&)
and using std::move
is likely to be a big win. The other question is whether the copy constructor should really call addAll()
at all because the source instance will have already done all the checking in add(Interval const&)
.
IntervalSets seem to be copied a lot in the code, this is likely to improve things.
|
||
antlrcpp::Any::Base::~Base() | ||
{ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually fine with having the implementation in a cpp file, so that change is ok. But when you start such a clean up it should be done fully, i.e. move all definitions here from the header.
_enabled = other._enabled; | ||
FinalAction(FinalAction &&other) : | ||
_cleanUp(std::move(other._cleanUp)), _enabled(other._enabled) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style: opening brace on the end of the line opening the block (as it was originally).
#else | ||
static std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> utfConverter; | ||
using UtfConverterType = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>; | ||
using UtfConverterWide = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>::wide_string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this on Windows? UtfConverterWide is defined as std::u32string here, while for non-Win it's a std::wstring_convert.
This part has already been improved and it's better not to include such changes in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged your changes.
@@ -47,7 +47,7 @@ const atn::ATN& XPathLexer::getATN() const { | |||
|
|||
void XPathLexer::action(RuleContext *context, size_t ruleIndex, size_t actionIndex) { | |||
switch (ruleIndex) { | |||
case 4: IDAction(dynamic_cast<antlr4::RuleContext *>(context), actionIndex); break; | |||
case 4: IDAction(context, actionIndex); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated code. You shouldn't change that manually.
And, btw. this patch of course has functional changes. For the UTF string conversions pending to be merged see my code: https://github.com/mike-lischke/antlr4/blob/master/runtime/Cpp/runtime/src/support/StringUtils.h |
Thanks for the review Mike -- I appreciate that it is a big patch and takes an effort! I'll respond in detail to your comments inline. |
Is this how you would like things to look for destructors in .cpp files as well, or would you prefer empty bodies there?
Which destructor is not virtual? There are two types of warnings leading to the destructor changes:
For the first case, the answer is explicitly declare the destructor and put it into a .cpp file. For the second case, the answer is to explicitly define the function, even if it is empty (for a destructor) or "= default". The intent here is to signal that you have considered what is going on, and say it's OK. Seeing the default constructed operator=() being significantly different to the copy constructor in IntervalSet came directly from dealing with this type of warning. Which in turn led to discovering that there were multiple assignments of cached interval sets in ATN, which led directly to a resolution of the race condition in issue #1826.
Please see above.
Will bow to the local curly brace convention! |
- Change qualifying suffix from "_in" to "_" to confirm with conventions.
- Explicitly cast negative values to size_t instead of using an offset from std::numeric_limts<size_t>::max().
- Change "#if SYM > x" to "#if defined(SYM) && SYM > x" as a pattern. In StringUtils.h this follows the pattern used earlier in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Only very few things left.
virtual ~Interval() {}; | ||
Interval(Interval const&) = default; | ||
virtual ~Interval(); | ||
Interval& operator=(Interval const&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, confused Interval and IntervalSet, sorry.
{ | ||
if (_readonly) { | ||
throw IllegalStateException("can't alter read only IntervalSet"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, tricky. In fact there is no copy function like this in Java code, so there is no real copy operation there. When they use an assignment they replace a reference. Maybe we should just delete the assignment operator to avoid any unintended behavior and so stay closer to the Java code? If needed one can easily create a new interval set.
@@ -30,15 +31,17 @@ namespace misc { | |||
protected: | |||
/// The list of sorted, disjoint intervals. | |||
std::vector<Interval> _intervals; | |||
bool _readonly; | |||
std::atomic<bool> _readonly; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, now I get it. You have that check before the mutex (where atomic comes in) then the mutex lock. So this seems ok then. I'm just a bit nervous about performance impact. The interval set class is one of the lowest level classes and used everywhere in the antlr4 runtime.
For your discussion about my summary, I believe that's all clarified in the code review now and we are almost there with the patch. |
After going further into the Interval/IntervalSet/readonly and the C++ pass-by-value vs. Java pass reference by value differences, I now see many issues in the code where classes with virtual functions are passed by value and assigned to instances of the base class. For example, ParseTreeMatcher::split() creates an instance of a TextChunk and assigns it to a Chunk(). This is a bug. https://en.wikipedia.org/wiki/Object_slicing I'm looking at fixing this, at least to an extent. Not sure whether to add to this PR, or just start another branch ... |
This is a separate issue and should be handled in an own patch. To be honest the parse tree matcher has not been tested so far and I'm not even sure there are runtime tests for it. So, everything below pattern/ is not fully complete, I'm afraid (because it's rarely used). |
OK, thanks Mike. After picking up Antlr again on Friday afternoon (after a few years), I've gone a lot further than I expected in looking at the C++ runtime. You've answered my internal "how does this work for anyone" question about the pattern/ stuff with "it doesn't". I'll submit my patch once this patch is accepted. Other pieces of work I haven't committed yet:
Unit tests pass for me with all changes. Of these, the first one is the one that is (in my view) possibly worth adding to this pull request. I hope the changes are worthwhile to you -- I think they help improve things, happy to update in response to feedback. (I'm in Sydney so I'm a little out of sync with your timezone.) |
Ah, that sounds promising. Nobody else (other than the runtime authors) has put so much detail work into the C++ runtime yet like you. Please open PRs as you progress and we will review and merge them one by one. |
@parrt This patch applies to the C++ runtime only and improves several aspects in it and fixes a threading issue. The runtime tests succeed, so I think it's worth to be merged. |
These changes are for compiling the C++ runtime and code using the C++ runtime headers with high warning levels and -Werror. There are no functional changes in this commit. Compiled with gcc 5.4 and clang 3.8.