-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-format] Handle common C++ non-keyword types as such #83709
Conversation
@llvm/pr-subscribers-clang-format Author: Owen Pan (owenca) ChangesFixes #83400. Full diff: https://github.com/llvm/llvm-project/pull/83709.diff 7 Files Affected:
diff --git a/clang/lib/Format/FormatToken.cpp b/clang/lib/Format/FormatToken.cpp
index 56a7b2d6387765..02952bd20acf9a 100644
--- a/clang/lib/Format/FormatToken.cpp
+++ b/clang/lib/Format/FormatToken.cpp
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
return nullptr;
}
+// Sorted common C++ non-keyword types.
+static SmallVector<StringRef> CppNonKeywordTypes = {
+ "byte", "int16_t", "int32_t", "int64_t", "int8_t",
+ "size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",
+};
+
// FIXME: This is copy&pasted from Sema. Put it in a common place and remove
// duplication.
-bool FormatToken::isSimpleTypeSpecifier() const {
+bool FormatToken::isSimpleTypeSpecifier(bool IsCpp) const {
switch (Tok.getKind()) {
case tok::kw_short:
case tok::kw_long:
@@ -66,13 +72,17 @@ bool FormatToken::isSimpleTypeSpecifier() const {
case tok::kw_decltype:
case tok::kw__Atomic:
return true;
+ case tok::identifier:
+ return IsCpp && std::binary_search(CppNonKeywordTypes.begin(),
+ CppNonKeywordTypes.end(), TokenText);
default:
return false;
}
}
-bool FormatToken::isTypeOrIdentifier() const {
- return isSimpleTypeSpecifier() || Tok.isOneOf(tok::kw_auto, tok::identifier);
+bool FormatToken::isTypeOrIdentifier(bool IsCpp) const {
+ return isSimpleTypeSpecifier(IsCpp) ||
+ Tok.isOneOf(tok::kw_auto, tok::identifier);
}
bool FormatToken::isBlockIndentedInitRBrace(const FormatStyle &Style) const {
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index 31245495041960..2bc136c51d23f1 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -674,9 +674,9 @@ struct FormatToken {
}
/// Determine whether the token is a simple-type-specifier.
- [[nodiscard]] bool isSimpleTypeSpecifier() const;
+ [[nodiscard]] bool isSimpleTypeSpecifier(bool IsCpp) const;
- [[nodiscard]] bool isTypeOrIdentifier() const;
+ [[nodiscard]] bool isTypeOrIdentifier(bool IsCpp) const;
bool isObjCAccessSpecifier() const {
return is(tok::at) && Next &&
diff --git a/clang/lib/Format/QualifierAlignmentFixer.cpp b/clang/lib/Format/QualifierAlignmentFixer.cpp
index 0c63d330b5aed4..834ae115908856 100644
--- a/clang/lib/Format/QualifierAlignmentFixer.cpp
+++ b/clang/lib/Format/QualifierAlignmentFixer.cpp
@@ -268,11 +268,13 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeRight(
if (isPossibleMacro(TypeToken))
return Tok;
+ const bool IsCpp = Style.isCpp();
+
// The case `const long long int volatile` -> `long long int const volatile`
// The case `long const long int volatile` -> `long long int const volatile`
// The case `long long volatile int const` -> `long long int const volatile`
// The case `const long long volatile int` -> `long long int const volatile`
- if (TypeToken->isSimpleTypeSpecifier()) {
+ if (TypeToken->isSimpleTypeSpecifier(IsCpp)) {
// The case `const decltype(foo)` -> `const decltype(foo)`
// The case `const typeof(foo)` -> `const typeof(foo)`
// The case `const _Atomic(foo)` -> `const _Atomic(foo)`
@@ -280,8 +282,10 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeRight(
return Tok;
const FormatToken *LastSimpleTypeSpecifier = TypeToken;
- while (isQualifierOrType(LastSimpleTypeSpecifier->getNextNonComment()))
+ while (isQualifierOrType(LastSimpleTypeSpecifier->getNextNonComment(),
+ IsCpp)) {
LastSimpleTypeSpecifier = LastSimpleTypeSpecifier->getNextNonComment();
+ }
rotateTokens(SourceMgr, Fixes, Tok, LastSimpleTypeSpecifier,
/*Left=*/false);
@@ -291,7 +295,7 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeRight(
// The case `unsigned short const` -> `unsigned short const`
// The case:
// `unsigned short volatile const` -> `unsigned short const volatile`
- if (PreviousCheck && PreviousCheck->isSimpleTypeSpecifier()) {
+ if (PreviousCheck && PreviousCheck->isSimpleTypeSpecifier(IsCpp)) {
if (LastQual != Tok)
rotateTokens(SourceMgr, Fixes, Tok, LastQual, /*Left=*/false);
return Tok;
@@ -408,11 +412,12 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeLeft(
// The case `volatile long long const int` -> `const volatile long long int`
// The case `const long long volatile int` -> `const volatile long long int`
// The case `long volatile long int const` -> `const volatile long long int`
- if (TypeToken->isSimpleTypeSpecifier()) {
+ if (const bool IsCpp = Style.isCpp();
+ TypeToken->isSimpleTypeSpecifier(IsCpp)) {
const FormatToken *LastSimpleTypeSpecifier = TypeToken;
while (isConfiguredQualifierOrType(
LastSimpleTypeSpecifier->getPreviousNonComment(),
- ConfiguredQualifierTokens)) {
+ ConfiguredQualifierTokens, IsCpp)) {
LastSimpleTypeSpecifier =
LastSimpleTypeSpecifier->getPreviousNonComment();
}
@@ -611,15 +616,15 @@ void prepareLeftRightOrderingForQualifierAlignmentFixer(
}
bool LeftRightQualifierAlignmentFixer::isQualifierOrType(
- const FormatToken *const Tok) {
- return Tok && (Tok->isSimpleTypeSpecifier() || Tok->is(tok::kw_auto) ||
+ const FormatToken *const Tok, bool IsCpp) {
+ return Tok && (Tok->isSimpleTypeSpecifier(IsCpp) || Tok->is(tok::kw_auto) ||
isQualifier(Tok));
}
bool LeftRightQualifierAlignmentFixer::isConfiguredQualifierOrType(
- const FormatToken *const Tok,
- const std::vector<tok::TokenKind> &Qualifiers) {
- return Tok && (Tok->isSimpleTypeSpecifier() || Tok->is(tok::kw_auto) ||
+ const FormatToken *const Tok, const std::vector<tok::TokenKind> &Qualifiers,
+ bool IsCpp) {
+ return Tok && (Tok->isSimpleTypeSpecifier(IsCpp) || Tok->is(tok::kw_auto) ||
isConfiguredQualifier(Tok, Qualifiers));
}
diff --git a/clang/lib/Format/QualifierAlignmentFixer.h b/clang/lib/Format/QualifierAlignmentFixer.h
index e922d800559510..e1cc27e62b13a0 100644
--- a/clang/lib/Format/QualifierAlignmentFixer.h
+++ b/clang/lib/Format/QualifierAlignmentFixer.h
@@ -71,10 +71,11 @@ class LeftRightQualifierAlignmentFixer : public TokenAnalyzer {
tok::TokenKind QualifierType);
// Is the Token a simple or qualifier type
- static bool isQualifierOrType(const FormatToken *Tok);
+ static bool isQualifierOrType(const FormatToken *Tok, bool IsCpp = true);
static bool
isConfiguredQualifierOrType(const FormatToken *Tok,
- const std::vector<tok::TokenKind> &Qualifiers);
+ const std::vector<tok::TokenKind> &Qualifiers,
+ bool IsCpp = true);
// Is the Token likely a Macro
static bool isPossibleMacro(const FormatToken *Tok);
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 04f0374b674e53..3c5d7326224eca 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -562,7 +562,7 @@ class AnnotatingParser {
(CurrentToken->is(tok::l_paren) && CurrentToken->Next &&
CurrentToken->Next->isOneOf(tok::star, tok::amp, tok::caret));
if ((CurrentToken->Previous->isOneOf(tok::kw_const, tok::kw_auto) ||
- CurrentToken->Previous->isSimpleTypeSpecifier()) &&
+ CurrentToken->Previous->isSimpleTypeSpecifier(IsCpp)) &&
!(CurrentToken->is(tok::l_brace) ||
(CurrentToken->is(tok::l_paren) && !ProbablyFunctionTypeLParen))) {
Contexts.back().IsExpression = false;
@@ -2573,7 +2573,7 @@ class AnnotatingParser {
return true;
// MyClass a;
- if (PreviousNotConst->isSimpleTypeSpecifier())
+ if (PreviousNotConst->isSimpleTypeSpecifier(IsCpp))
return true;
// type[] a in Java
@@ -2704,9 +2704,10 @@ class AnnotatingParser {
}
// Heuristically try to determine whether the parentheses contain a type.
- auto IsQualifiedPointerOrReference = [](FormatToken *T) {
+ auto IsQualifiedPointerOrReference = [this](FormatToken *T) {
// This is used to handle cases such as x = (foo *const)&y;
- assert(!T->isSimpleTypeSpecifier() && "Should have already been checked");
+ assert(!T->isSimpleTypeSpecifier(IsCpp) &&
+ "Should have already been checked");
// Strip trailing qualifiers such as const or volatile when checking
// whether the parens could be a cast to a pointer/reference type.
while (T) {
@@ -2738,7 +2739,7 @@ class AnnotatingParser {
bool ParensAreType =
!Tok.Previous ||
Tok.Previous->isOneOf(TT_TemplateCloser, TT_TypeDeclarationParen) ||
- Tok.Previous->isSimpleTypeSpecifier() ||
+ Tok.Previous->isSimpleTypeSpecifier(IsCpp) ||
IsQualifiedPointerOrReference(Tok.Previous);
bool ParensCouldEndDecl =
Tok.Next->isOneOf(tok::equal, tok::semi, tok::l_brace, tok::greater);
@@ -3595,7 +3596,8 @@ static bool isFunctionDeclarationName(bool IsCpp, const FormatToken &Current,
if (!Current.Tok.getIdentifierInfo())
return false;
- auto skipOperatorName = [](const FormatToken *Next) -> const FormatToken * {
+ auto skipOperatorName =
+ [IsCpp](const FormatToken *Next) -> const FormatToken * {
for (; Next; Next = Next->Next) {
if (Next->is(TT_OverloadedOperatorLParen))
return Next;
@@ -3614,7 +3616,7 @@ static bool isFunctionDeclarationName(bool IsCpp, const FormatToken &Current,
Next = Next->Next;
continue;
}
- if ((Next->isSimpleTypeSpecifier() || Next->is(tok::identifier)) &&
+ if ((Next->isSimpleTypeSpecifier(IsCpp) || Next->is(tok::identifier)) &&
Next->Next && Next->Next->isPointerOrReference()) {
// For operator void*(), operator char*(), operator Foo*().
Next = Next->Next;
@@ -3712,7 +3714,7 @@ static bool isFunctionDeclarationName(bool IsCpp, const FormatToken &Current,
Tok = Tok->MatchingParen;
continue;
}
- if (Tok->is(tok::kw_const) || Tok->isSimpleTypeSpecifier() ||
+ if (Tok->is(tok::kw_const) || Tok->isSimpleTypeSpecifier(IsCpp) ||
Tok->isOneOf(TT_PointerOrReference, TT_StartOfName, tok::ellipsis,
TT_TypeName)) {
return true;
@@ -4376,7 +4378,7 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
if (Left.Tok.isLiteral())
return true;
// for (auto a = 0, b = 0; const auto & c : {1, 2, 3})
- if (Left.isTypeOrIdentifier() && Right.Next && Right.Next->Next &&
+ if (Left.isTypeOrIdentifier(IsCpp) && Right.Next && Right.Next->Next &&
Right.Next->Next->is(TT_RangeBasedForLoopColon)) {
return getTokenPointerOrReferenceAlignment(Right) !=
FormatStyle::PAS_Left;
@@ -4419,8 +4421,8 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
if (Right.is(tok::l_brace) && Right.is(BK_Block))
return true;
// for (auto a = 0, b = 0; const auto& c : {1, 2, 3})
- if (Left.Previous && Left.Previous->isTypeOrIdentifier() && Right.Next &&
- Right.Next->is(TT_RangeBasedForLoopColon)) {
+ if (Left.Previous && Left.Previous->isTypeOrIdentifier(IsCpp) &&
+ Right.Next && Right.Next->is(TT_RangeBasedForLoopColon)) {
return getTokenPointerOrReferenceAlignment(Left) !=
FormatStyle::PAS_Right;
}
@@ -4458,7 +4460,8 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
if (Right.isPointerOrReference()) {
const FormatToken *Previous = &Left;
while (Previous && Previous->isNot(tok::kw_operator)) {
- if (Previous->is(tok::identifier) || Previous->isSimpleTypeSpecifier()) {
+ if (Previous->is(tok::identifier) ||
+ Previous->isSimpleTypeSpecifier(IsCpp)) {
Previous = Previous->getPreviousNonComment();
continue;
}
@@ -4647,7 +4650,7 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
if (!Style.isVerilog() &&
(Left.isOneOf(tok::identifier, tok::greater, tok::r_square,
tok::r_paren) ||
- Left.isSimpleTypeSpecifier()) &&
+ Left.isSimpleTypeSpecifier(IsCpp)) &&
Right.is(tok::l_brace) && Right.getNextNonComment() &&
Right.isNot(BK_Block)) {
return false;
diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp
index 2ce291da11b86a..0b96dd80252ad1 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -1866,7 +1866,7 @@ void UnwrappedLineParser::parseStructuralElement(
nextToken();
// Block return type.
if (FormatTok->Tok.isAnyIdentifier() ||
- FormatTok->isSimpleTypeSpecifier()) {
+ FormatTok->isSimpleTypeSpecifier(IsCpp)) {
nextToken();
// Return types: pointers are ok too.
while (FormatTok->is(tok::star))
@@ -2222,7 +2222,7 @@ bool UnwrappedLineParser::tryToParseLambda() {
bool InTemplateParameterList = false;
while (FormatTok->isNot(tok::l_brace)) {
- if (FormatTok->isSimpleTypeSpecifier()) {
+ if (FormatTok->isSimpleTypeSpecifier(IsCpp)) {
nextToken();
continue;
}
@@ -3415,7 +3415,7 @@ bool clang::format::UnwrappedLineParser::parseRequires() {
break;
}
default:
- if (PreviousNonComment->isTypeOrIdentifier()) {
+ if (PreviousNonComment->isTypeOrIdentifier(IsCpp)) {
// This is a requires clause.
parseRequiresClause(RequiresToken);
return true;
@@ -3478,7 +3478,7 @@ bool clang::format::UnwrappedLineParser::parseRequires() {
--OpenAngles;
break;
default:
- if (NextToken->isSimpleTypeSpecifier()) {
+ if (NextToken->isSimpleTypeSpecifier(IsCpp)) {
FormatTok = Tokens->setPosition(StoredPosition);
parseRequiresExpression(RequiresToken);
return false;
@@ -3962,8 +3962,8 @@ void UnwrappedLineParser::parseRecord(bool ParseAsExpr) {
}
if (FormatTok->is(tok::l_square)) {
FormatToken *Previous = FormatTok->Previous;
- if (!Previous ||
- !(Previous->is(tok::r_paren) || Previous->isTypeOrIdentifier())) {
+ if (!Previous || (Previous->isNot(tok::r_paren) &&
+ !Previous->isTypeOrIdentifier(IsCpp))) {
// Don't try parsing a lambda if we had a closing parenthesis before,
// it was probably a pointer to an array: int (*)[].
if (!tryToParseLambda())
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index c736dac8dabf21..49382b595927e1 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -620,6 +620,12 @@ TEST_F(TokenAnnotatorTest, UnderstandsCasts) {
ASSERT_EQ(Tokens.size(), 8u) << Tokens;
EXPECT_TOKEN(Tokens[3], tok::r_paren, TT_Unknown);
EXPECT_TOKEN(Tokens[4], tok::amp, TT_BinaryOperator);
+
+ Tokens = annotate("#define FOO(bar) foo((uint64_t)&bar)");
+ ASSERT_EQ(Tokens.size(), 15u) << Tokens;
+ EXPECT_TOKEN(Tokens[10], tok::r_paren, TT_CastRParen);
+ // FIXME: should be TT_PointerOrReference instead.
+ EXPECT_TOKEN(Tokens[11], tok::amp, TT_UnaryOperator);
}
TEST_F(TokenAnnotatorTest, UnderstandsDynamicExceptionSpecifier) {
|
Another possibility to consider for the original bug: |
I don’t think that would work as |
OK, but the context is deciding whether I don't think this PR fixes #83400 -- handling a small handful of common types won't address the same issue for the much larger class of cases where the type name is not one of these types. Eg, in #83400 itself the examples included: - static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type)*VariantInternal::get_int(v); } \
+ static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type) * VariantInternal::get_int(v); } \ But looking for a parenthesized single identifier addresses all of the examples in the issue. |
This patch does not only fix formatting of C-casting to a C++ standard type. It correctly identifies (most of) such types and might have fixed other kinds of bugs. For casting to a user-defined type, the TypeNames option can be used. |
Sure, this patch seems like a good change. But it does not fix #83400. |
It does fix the example given. For other diffs involving user-defined types, the user need to use the TypeNames option. (If that doesn't work, a separate issue should be filed.) |
I don't think it would work, but you are welcome to submit a patch to prove me wrong. 🙂 |
Removed `byte` and added `intptr_t`/`uintptr_t`.
FormatToken::isTypeName(). Also added test cases for user-defined types.
#83400 has 6 real-world examples. This patch fixes none of them. It also has a reduced testcase, which this patch does fix. But fixing the reduced testcase without fixing the real-world examples is not fixing the bug. |
You took what I said out of context. I'll reiterate it below:
So again, this patch fixes the test case that "reproduces the issue" (in the issue author's own words), and I already explained to you that the other 6 diffs (which are of the same construct, i.e. casting an address to a user-defined type in a macro definition) require the user to use the TypeNames option. (See the added test cases in fcae75d above.) If the author of #83400 is still not satisfied, there's nothing to stop them from reopening the issue or filing a new issue. However, that's highly unlikely as they already expressed their view on it:
|
+1 for this, it will be the 95%/5% rule, you might think all cases of (identifier) is a cast, but I'm pretty sure it won't be I'm with @owenca on this, this patch will move us closer, meaning users don't need to add common types to TypeNames |
clang/lib/Format/FormatToken.cpp
Outdated
return isSimpleTypeSpecifier() || Tok.isOneOf(tok::kw_auto, tok::identifier); | ||
// Sorted common C++ non-keyword types. | ||
static SmallVector<StringRef> CppNonKeywordTypes = { | ||
"int16_t", "int32_t", "int64_t", "int8_t", "intptr_t", "size_t", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssize_t too? should everything be considered? https://en.cppreference.com/w/cpp/types/integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssize_t
is a POSIX type. Regarding the fixed-width integer types, I wanted to start with a few common ones. (I will add ptrdiff_t
though.)
EXPECT_TOKEN(Tokens[10], tok::r_paren, TT_CastRParen); | ||
EXPECT_TOKEN(Tokens[11], tok::amp, TT_UnaryOperator); | ||
|
||
Tokens = annotate("#define FOO(bar) foo((time_t) & bar)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't have time_t in the list did you? does that mean we are just checking this one uint64_t type above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally excluded time_t
, but on second thought I'll add it, along with clock_t
.
Specifically, what I suggested was: when disambiguating whether
As I said before, I think special-casing these typedefs is a good thing - it's reasonable for clang-format to assume they're type names. But I don't think this moves us much closer to fixing the bug, because most casts won't be to one of these typedefs. I do wonder, though -- there's a lot of churn in this patch adding and wiring through a new parameter. Can the |
Just thinking out loud and maybe not for this patch but for a more general solution, what we seem to never do is collect information about types local to the file as we go; For example what if a first pass identified types from declarations and function argument declarations, return types, template arguments and added them to a structure that would have some probability that a given identifier was actually a type. Foo x; This would give "Foo" some sort of weight to it being a type, which then would give more promenance to it being a cast than a binary operation. The problem I see and why I think I agree more with @owenca is people love to wrap logical operations in '()' so I think assuming (a) is a cast is hard especailly for the Given the example below the macro argument doesn't tell me its a type, but the By the time we'd arrived at the
There is some precident for this in using the "majority" rule of something local to the file (from Jaspers original slides)
Any thoughts? |
I don't see how that would work. Please note that the TypeNames option is not specific to C/C++/Objective-C. Otherwise, I could simply change |
I'm kind of skeptical because we can't even correctly annotate the declarators all the time, but someone must actually try it to find out. |
Fixes #83400.