Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-format] Handle common C++ non-keyword types as such #83709

Merged
merged 6 commits into from
Mar 9, 2024

Conversation

owenca
Copy link
Contributor

@owenca owenca commented Mar 3, 2024

Fixes #83400.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 3, 2024

@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)

Changes

Fixes #83400.


Full diff: https://github.com/llvm/llvm-project/pull/83709.diff

7 Files Affected:

  • (modified) clang/lib/Format/FormatToken.cpp (+13-3)
  • (modified) clang/lib/Format/FormatToken.h (+2-2)
  • (modified) clang/lib/Format/QualifierAlignmentFixer.cpp (+15-10)
  • (modified) clang/lib/Format/QualifierAlignmentFixer.h (+3-2)
  • (modified) clang/lib/Format/TokenAnnotator.cpp (+16-13)
  • (modified) clang/lib/Format/UnwrappedLineParser.cpp (+6-6)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+6)
diff --git a/clang/lib/Format/FormatToken.cpp b/clang/lib/Format/FormatToken.cpp
index 56a7b2d6387765..02952bd20acf9a 100644
--- a/clang/lib/Format/FormatToken.cpp
+++ b/clang/lib/Format/FormatToken.cpp
@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector<StringRef> CppNonKeywordTypes = {
+    "byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+    "size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",
+};
+
 // FIXME: This is copy&pasted from Sema. Put it in a common place and remove
 // duplication.
-bool FormatToken::isSimpleTypeSpecifier() const {
+bool FormatToken::isSimpleTypeSpecifier(bool IsCpp) const {
   switch (Tok.getKind()) {
   case tok::kw_short:
   case tok::kw_long:
@@ -66,13 +72,17 @@ bool FormatToken::isSimpleTypeSpecifier() const {
   case tok::kw_decltype:
   case tok::kw__Atomic:
     return true;
+  case tok::identifier:
+    return IsCpp && std::binary_search(CppNonKeywordTypes.begin(),
+                                       CppNonKeywordTypes.end(), TokenText);
   default:
     return false;
   }
 }
 
-bool FormatToken::isTypeOrIdentifier() const {
-  return isSimpleTypeSpecifier() || Tok.isOneOf(tok::kw_auto, tok::identifier);
+bool FormatToken::isTypeOrIdentifier(bool IsCpp) const {
+  return isSimpleTypeSpecifier(IsCpp) ||
+         Tok.isOneOf(tok::kw_auto, tok::identifier);
 }
 
 bool FormatToken::isBlockIndentedInitRBrace(const FormatStyle &Style) const {
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index 31245495041960..2bc136c51d23f1 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -674,9 +674,9 @@ struct FormatToken {
   }
 
   /// Determine whether the token is a simple-type-specifier.
-  [[nodiscard]] bool isSimpleTypeSpecifier() const;
+  [[nodiscard]] bool isSimpleTypeSpecifier(bool IsCpp) const;
 
-  [[nodiscard]] bool isTypeOrIdentifier() const;
+  [[nodiscard]] bool isTypeOrIdentifier(bool IsCpp) const;
 
   bool isObjCAccessSpecifier() const {
     return is(tok::at) && Next &&
diff --git a/clang/lib/Format/QualifierAlignmentFixer.cpp b/clang/lib/Format/QualifierAlignmentFixer.cpp
index 0c63d330b5aed4..834ae115908856 100644
--- a/clang/lib/Format/QualifierAlignmentFixer.cpp
+++ b/clang/lib/Format/QualifierAlignmentFixer.cpp
@@ -268,11 +268,13 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeRight(
   if (isPossibleMacro(TypeToken))
     return Tok;
 
+  const bool IsCpp = Style.isCpp();
+
   // The case `const long long int volatile` -> `long long int const volatile`
   // The case `long const long int volatile` -> `long long int const volatile`
   // The case `long long volatile int const` -> `long long int const volatile`
   // The case `const long long volatile int` -> `long long int const volatile`
-  if (TypeToken->isSimpleTypeSpecifier()) {
+  if (TypeToken->isSimpleTypeSpecifier(IsCpp)) {
     // The case `const decltype(foo)` -> `const decltype(foo)`
     // The case `const typeof(foo)` -> `const typeof(foo)`
     // The case `const _Atomic(foo)` -> `const _Atomic(foo)`
@@ -280,8 +282,10 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeRight(
       return Tok;
 
     const FormatToken *LastSimpleTypeSpecifier = TypeToken;
-    while (isQualifierOrType(LastSimpleTypeSpecifier->getNextNonComment()))
+    while (isQualifierOrType(LastSimpleTypeSpecifier->getNextNonComment(),
+                             IsCpp)) {
       LastSimpleTypeSpecifier = LastSimpleTypeSpecifier->getNextNonComment();
+    }
 
     rotateTokens(SourceMgr, Fixes, Tok, LastSimpleTypeSpecifier,
                  /*Left=*/false);
@@ -291,7 +295,7 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeRight(
   // The case  `unsigned short const` -> `unsigned short const`
   // The case:
   // `unsigned short volatile const` -> `unsigned short const volatile`
-  if (PreviousCheck && PreviousCheck->isSimpleTypeSpecifier()) {
+  if (PreviousCheck && PreviousCheck->isSimpleTypeSpecifier(IsCpp)) {
     if (LastQual != Tok)
       rotateTokens(SourceMgr, Fixes, Tok, LastQual, /*Left=*/false);
     return Tok;
@@ -408,11 +412,12 @@ const FormatToken *LeftRightQualifierAlignmentFixer::analyzeLeft(
   // The case `volatile long long const int` -> `const volatile long long int`
   // The case `const long long volatile int` -> `const volatile long long int`
   // The case `long volatile long int const` -> `const volatile long long int`
-  if (TypeToken->isSimpleTypeSpecifier()) {
+  if (const bool IsCpp = Style.isCpp();
+      TypeToken->isSimpleTypeSpecifier(IsCpp)) {
     const FormatToken *LastSimpleTypeSpecifier = TypeToken;
     while (isConfiguredQualifierOrType(
         LastSimpleTypeSpecifier->getPreviousNonComment(),
-        ConfiguredQualifierTokens)) {
+        ConfiguredQualifierTokens, IsCpp)) {
       LastSimpleTypeSpecifier =
           LastSimpleTypeSpecifier->getPreviousNonComment();
     }
@@ -611,15 +616,15 @@ void prepareLeftRightOrderingForQualifierAlignmentFixer(
 }
 
 bool LeftRightQualifierAlignmentFixer::isQualifierOrType(
-    const FormatToken *const Tok) {
-  return Tok && (Tok->isSimpleTypeSpecifier() || Tok->is(tok::kw_auto) ||
+    const FormatToken *const Tok, bool IsCpp) {
+  return Tok && (Tok->isSimpleTypeSpecifier(IsCpp) || Tok->is(tok::kw_auto) ||
                  isQualifier(Tok));
 }
 
 bool LeftRightQualifierAlignmentFixer::isConfiguredQualifierOrType(
-    const FormatToken *const Tok,
-    const std::vector<tok::TokenKind> &Qualifiers) {
-  return Tok && (Tok->isSimpleTypeSpecifier() || Tok->is(tok::kw_auto) ||
+    const FormatToken *const Tok, const std::vector<tok::TokenKind> &Qualifiers,
+    bool IsCpp) {
+  return Tok && (Tok->isSimpleTypeSpecifier(IsCpp) || Tok->is(tok::kw_auto) ||
                  isConfiguredQualifier(Tok, Qualifiers));
 }
 
diff --git a/clang/lib/Format/QualifierAlignmentFixer.h b/clang/lib/Format/QualifierAlignmentFixer.h
index e922d800559510..e1cc27e62b13a0 100644
--- a/clang/lib/Format/QualifierAlignmentFixer.h
+++ b/clang/lib/Format/QualifierAlignmentFixer.h
@@ -71,10 +71,11 @@ class LeftRightQualifierAlignmentFixer : public TokenAnalyzer {
                                  tok::TokenKind QualifierType);
 
   // Is the Token a simple or qualifier type
-  static bool isQualifierOrType(const FormatToken *Tok);
+  static bool isQualifierOrType(const FormatToken *Tok, bool IsCpp = true);
   static bool
   isConfiguredQualifierOrType(const FormatToken *Tok,
-                              const std::vector<tok::TokenKind> &Qualifiers);
+                              const std::vector<tok::TokenKind> &Qualifiers,
+                              bool IsCpp = true);
 
   // Is the Token likely a Macro
   static bool isPossibleMacro(const FormatToken *Tok);
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 04f0374b674e53..3c5d7326224eca 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -562,7 +562,7 @@ class AnnotatingParser {
           (CurrentToken->is(tok::l_paren) && CurrentToken->Next &&
            CurrentToken->Next->isOneOf(tok::star, tok::amp, tok::caret));
       if ((CurrentToken->Previous->isOneOf(tok::kw_const, tok::kw_auto) ||
-           CurrentToken->Previous->isSimpleTypeSpecifier()) &&
+           CurrentToken->Previous->isSimpleTypeSpecifier(IsCpp)) &&
           !(CurrentToken->is(tok::l_brace) ||
             (CurrentToken->is(tok::l_paren) && !ProbablyFunctionTypeLParen))) {
         Contexts.back().IsExpression = false;
@@ -2573,7 +2573,7 @@ class AnnotatingParser {
       return true;
 
     // MyClass a;
-    if (PreviousNotConst->isSimpleTypeSpecifier())
+    if (PreviousNotConst->isSimpleTypeSpecifier(IsCpp))
       return true;
 
     // type[] a in Java
@@ -2704,9 +2704,10 @@ class AnnotatingParser {
     }
 
     // Heuristically try to determine whether the parentheses contain a type.
-    auto IsQualifiedPointerOrReference = [](FormatToken *T) {
+    auto IsQualifiedPointerOrReference = [this](FormatToken *T) {
       // This is used to handle cases such as x = (foo *const)&y;
-      assert(!T->isSimpleTypeSpecifier() && "Should have already been checked");
+      assert(!T->isSimpleTypeSpecifier(IsCpp) &&
+             "Should have already been checked");
       // Strip trailing qualifiers such as const or volatile when checking
       // whether the parens could be a cast to a pointer/reference type.
       while (T) {
@@ -2738,7 +2739,7 @@ class AnnotatingParser {
     bool ParensAreType =
         !Tok.Previous ||
         Tok.Previous->isOneOf(TT_TemplateCloser, TT_TypeDeclarationParen) ||
-        Tok.Previous->isSimpleTypeSpecifier() ||
+        Tok.Previous->isSimpleTypeSpecifier(IsCpp) ||
         IsQualifiedPointerOrReference(Tok.Previous);
     bool ParensCouldEndDecl =
         Tok.Next->isOneOf(tok::equal, tok::semi, tok::l_brace, tok::greater);
@@ -3595,7 +3596,8 @@ static bool isFunctionDeclarationName(bool IsCpp, const FormatToken &Current,
   if (!Current.Tok.getIdentifierInfo())
     return false;
 
-  auto skipOperatorName = [](const FormatToken *Next) -> const FormatToken * {
+  auto skipOperatorName =
+      [IsCpp](const FormatToken *Next) -> const FormatToken * {
     for (; Next; Next = Next->Next) {
       if (Next->is(TT_OverloadedOperatorLParen))
         return Next;
@@ -3614,7 +3616,7 @@ static bool isFunctionDeclarationName(bool IsCpp, const FormatToken &Current,
         Next = Next->Next;
         continue;
       }
-      if ((Next->isSimpleTypeSpecifier() || Next->is(tok::identifier)) &&
+      if ((Next->isSimpleTypeSpecifier(IsCpp) || Next->is(tok::identifier)) &&
           Next->Next && Next->Next->isPointerOrReference()) {
         // For operator void*(), operator char*(), operator Foo*().
         Next = Next->Next;
@@ -3712,7 +3714,7 @@ static bool isFunctionDeclarationName(bool IsCpp, const FormatToken &Current,
       Tok = Tok->MatchingParen;
       continue;
     }
-    if (Tok->is(tok::kw_const) || Tok->isSimpleTypeSpecifier() ||
+    if (Tok->is(tok::kw_const) || Tok->isSimpleTypeSpecifier(IsCpp) ||
         Tok->isOneOf(TT_PointerOrReference, TT_StartOfName, tok::ellipsis,
                      TT_TypeName)) {
       return true;
@@ -4376,7 +4378,7 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
     if (Left.Tok.isLiteral())
       return true;
     // for (auto a = 0, b = 0; const auto & c : {1, 2, 3})
-    if (Left.isTypeOrIdentifier() && Right.Next && Right.Next->Next &&
+    if (Left.isTypeOrIdentifier(IsCpp) && Right.Next && Right.Next->Next &&
         Right.Next->Next->is(TT_RangeBasedForLoopColon)) {
       return getTokenPointerOrReferenceAlignment(Right) !=
              FormatStyle::PAS_Left;
@@ -4419,8 +4421,8 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
     if (Right.is(tok::l_brace) && Right.is(BK_Block))
       return true;
     // for (auto a = 0, b = 0; const auto& c : {1, 2, 3})
-    if (Left.Previous && Left.Previous->isTypeOrIdentifier() && Right.Next &&
-        Right.Next->is(TT_RangeBasedForLoopColon)) {
+    if (Left.Previous && Left.Previous->isTypeOrIdentifier(IsCpp) &&
+        Right.Next && Right.Next->is(TT_RangeBasedForLoopColon)) {
       return getTokenPointerOrReferenceAlignment(Left) !=
              FormatStyle::PAS_Right;
     }
@@ -4458,7 +4460,8 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
   if (Right.isPointerOrReference()) {
     const FormatToken *Previous = &Left;
     while (Previous && Previous->isNot(tok::kw_operator)) {
-      if (Previous->is(tok::identifier) || Previous->isSimpleTypeSpecifier()) {
+      if (Previous->is(tok::identifier) ||
+          Previous->isSimpleTypeSpecifier(IsCpp)) {
         Previous = Previous->getPreviousNonComment();
         continue;
       }
@@ -4647,7 +4650,7 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
   if (!Style.isVerilog() &&
       (Left.isOneOf(tok::identifier, tok::greater, tok::r_square,
                     tok::r_paren) ||
-       Left.isSimpleTypeSpecifier()) &&
+       Left.isSimpleTypeSpecifier(IsCpp)) &&
       Right.is(tok::l_brace) && Right.getNextNonComment() &&
       Right.isNot(BK_Block)) {
     return false;
diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp
index 2ce291da11b86a..0b96dd80252ad1 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -1866,7 +1866,7 @@ void UnwrappedLineParser::parseStructuralElement(
       nextToken();
       // Block return type.
       if (FormatTok->Tok.isAnyIdentifier() ||
-          FormatTok->isSimpleTypeSpecifier()) {
+          FormatTok->isSimpleTypeSpecifier(IsCpp)) {
         nextToken();
         // Return types: pointers are ok too.
         while (FormatTok->is(tok::star))
@@ -2222,7 +2222,7 @@ bool UnwrappedLineParser::tryToParseLambda() {
   bool InTemplateParameterList = false;
 
   while (FormatTok->isNot(tok::l_brace)) {
-    if (FormatTok->isSimpleTypeSpecifier()) {
+    if (FormatTok->isSimpleTypeSpecifier(IsCpp)) {
       nextToken();
       continue;
     }
@@ -3415,7 +3415,7 @@ bool clang::format::UnwrappedLineParser::parseRequires() {
     break;
   }
   default:
-    if (PreviousNonComment->isTypeOrIdentifier()) {
+    if (PreviousNonComment->isTypeOrIdentifier(IsCpp)) {
       // This is a requires clause.
       parseRequiresClause(RequiresToken);
       return true;
@@ -3478,7 +3478,7 @@ bool clang::format::UnwrappedLineParser::parseRequires() {
       --OpenAngles;
       break;
     default:
-      if (NextToken->isSimpleTypeSpecifier()) {
+      if (NextToken->isSimpleTypeSpecifier(IsCpp)) {
         FormatTok = Tokens->setPosition(StoredPosition);
         parseRequiresExpression(RequiresToken);
         return false;
@@ -3962,8 +3962,8 @@ void UnwrappedLineParser::parseRecord(bool ParseAsExpr) {
       }
       if (FormatTok->is(tok::l_square)) {
         FormatToken *Previous = FormatTok->Previous;
-        if (!Previous ||
-            !(Previous->is(tok::r_paren) || Previous->isTypeOrIdentifier())) {
+        if (!Previous || (Previous->isNot(tok::r_paren) &&
+                          !Previous->isTypeOrIdentifier(IsCpp))) {
           // Don't try parsing a lambda if we had a closing parenthesis before,
           // it was probably a pointer to an array: int (*)[].
           if (!tryToParseLambda())
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index c736dac8dabf21..49382b595927e1 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -620,6 +620,12 @@ TEST_F(TokenAnnotatorTest, UnderstandsCasts) {
   ASSERT_EQ(Tokens.size(), 8u) << Tokens;
   EXPECT_TOKEN(Tokens[3], tok::r_paren, TT_Unknown);
   EXPECT_TOKEN(Tokens[4], tok::amp, TT_BinaryOperator);
+
+  Tokens = annotate("#define FOO(bar) foo((uint64_t)&bar)");
+  ASSERT_EQ(Tokens.size(), 15u) << Tokens;
+  EXPECT_TOKEN(Tokens[10], tok::r_paren, TT_CastRParen);
+  // FIXME: should be TT_PointerOrReference instead.
+  EXPECT_TOKEN(Tokens[11], tok::amp, TT_UnaryOperator);
 }
 
 TEST_F(TokenAnnotatorTest, UnderstandsDynamicExceptionSpecifier) {

clang/lib/Format/FormatToken.cpp Outdated Show resolved Hide resolved
clang/lib/Format/FormatToken.cpp Outdated Show resolved Hide resolved
clang/lib/Format/FormatToken.cpp Outdated Show resolved Hide resolved
clang/lib/Format/FormatToken.cpp Outdated Show resolved Hide resolved
@zygoloid
Copy link
Collaborator

zygoloid commented Mar 3, 2024

Another possibility to consider for the original bug: (single_identifier) is almost certainly a cast, not redundant parentheses, unless single_identifier names a macro argument. So I wonder if that would be a better heuristic to use to fix the regression.

@owenca
Copy link
Contributor Author

owenca commented Mar 3, 2024

Another possibility to consider for the original bug: (single_identifier) is almost certainly a cast, not redundant parentheses, unless single_identifier names a macro argument. So I wonder if that would be a better heuristic to use to fix the regression.

I don’t think that would work as(identifier) can be a number of constructs depending on the context. It can be part of a function/macro call, function declaration, macro definition, C++ cast, C cast, sizeof operand, etc.

@zygoloid
Copy link
Collaborator

zygoloid commented Mar 3, 2024

Another possibility to consider for the original bug: (single_identifier) is almost certainly a cast, not redundant parentheses, unless single_identifier names a macro argument. So I wonder if that would be a better heuristic to use to fix the regression.

I don’t think that would work as(identifier) can be a number of constructs depending on the context. It can be part of a function/macro call, function declaration, macro definition, C++ cast, C cast, sizeof operand, etc.

OK, but the context is deciding whether (identifier)&x or (identifier)*x is a binary operator or a cast. Given no other information, the fact that it's a single identifier seems like a very strong signal that it's a cast.

I don't think this PR fixes #83400 -- handling a small handful of common types won't address the same issue for the much larger class of cases where the type name is not one of these types. Eg, in #83400 itself the examples included:

-		static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type)*VariantInternal::get_int(v); }    \
+		static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type) * VariantInternal::get_int(v); }  \

But looking for a parenthesized single identifier addresses all of the examples in the issue.

@owenca
Copy link
Contributor Author

owenca commented Mar 3, 2024

Another possibility to consider for the original bug: (single_identifier) is almost certainly a cast, not redundant parentheses, unless single_identifier names a macro argument. So I wonder if that would be a better heuristic to use to fix the regression.

I don’t think that would work as(identifier) can be a number of constructs depending on the context. It can be part of a function/macro call, function declaration, macro definition, C++ cast, C cast, sizeof operand, etc.

OK, but the context is deciding whether (identifier)&x or (identifier)*x is a binary operator or a cast. Given no other information, the fact that it's a single identifier seems like a very strong signal that it's a cast.

I don't think this PR fixes #83400 -- handling a small handful of common types won't address the same issue for the much larger class of cases where the type name is not one of these types. Eg, in #83400 itself the examples included:

-		static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type)*VariantInternal::get_int(v); }    \
+		static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type) * VariantInternal::get_int(v); }  \

But looking for a parenthesized single identifier addresses all of the examples in the issue.

This patch does not only fix formatting of C-casting to a C++ standard type. It correctly identifies (most of) such types and might have fixed other kinds of bugs. For casting to a user-defined type, the TypeNames option can be used.

@zygoloid
Copy link
Collaborator

zygoloid commented Mar 3, 2024

This patch does not only fix formatting of C-casting to a C++ standard type. It correctly identifies (most of) such types and might have fixed other kinds of bugs.

Sure, this patch seems like a good change. But it does not fix #83400.

@owenca
Copy link
Contributor Author

owenca commented Mar 3, 2024

This patch does not only fix formatting of C-casting to a C++ standard type. It correctly identifies (most of) such types and might have fixed other kinds of bugs.

Sure, this patch seems like a good change. But it does not fix #83400.

It does fix the example given. For other diffs involving user-defined types, the user need to use the TypeNames option. (If that doesn't work, a separate issue should be filed.)

@owenca
Copy link
Contributor Author

owenca commented Mar 3, 2024

But looking for a parenthesized single identifier addresses all of the examples in the issue.

I don't think it would work, but you are welcome to submit a patch to prove me wrong. 🙂

Removed `byte` and added `intptr_t`/`uintptr_t`.
FormatToken::isTypeName(). Also added test cases for user-defined types.
@zygoloid
Copy link
Collaborator

zygoloid commented Mar 4, 2024

It does fix the example given.

#83400 has 6 real-world examples. This patch fixes none of them. It also has a reduced testcase, which this patch does fix. But fixing the reduced testcase without fixing the real-world examples is not fixing the bug.

@owenca
Copy link
Contributor Author

owenca commented Mar 5, 2024

It does fix the example given.

#83400 has 6 real-world examples. This patch fixes none of them. It also has a reduced testcase, which this patch does fix. But fixing the reduced testcase without fixing the real-world examples is not fixing the bug.

You took what I said out of context. I'll reiterate it below:

It does fix the example given. For other diffs involving user-defined types, the user need to use the TypeNames option. (If that doesn't work, a separate issue should be filed.)

So again, this patch fixes the test case that "reproduces the issue" (in the issue author's own words), and I already explained to you that the other 6 diffs (which are of the same construct, i.e. casting an address to a user-defined type in a macro definition) require the user to use the TypeNames option. (See the added test cases in fcae75d above.)

If the author of #83400 is still not satisfied, there's nothing to stop them from reopening the issue or filing a new issue. However, that's highly unlikely as they already expressed their view on it:

But you can also see in the same PR godotengine/godot#88959 that other cases where & is a bitwise operator were fixed by clang-format. I'm not sure there's any easy way for clang-format to know which is which, if so I understand if this is considered a "won't fix".

@mydeveloperday
Copy link
Contributor

But looking for a parenthesized single identifier addresses all of the examples in the issue.

I don't think it would work, but you are welcome to submit a patch to prove me wrong. 🙂

+1 for this, it will be the 95%/5% rule, you might think all cases of (identifier) is a cast, but I'm pretty sure it won't be I'm with @owenca on this, this patch will move us closer, meaning users don't need to add common types to TypeNames

@mydeveloperday mydeveloperday self-requested a review March 5, 2024 11:27
return isSimpleTypeSpecifier() || Tok.isOneOf(tok::kw_auto, tok::identifier);
// Sorted common C++ non-keyword types.
static SmallVector<StringRef> CppNonKeywordTypes = {
"int16_t", "int32_t", "int64_t", "int8_t", "intptr_t", "size_t",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssize_t too? should everything be considered? https://en.cppreference.com/w/cpp/types/integer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssize_t is a POSIX type. Regarding the fixed-width integer types, I wanted to start with a few common ones. (I will add ptrdiff_t though.)

clang/lib/Format/QualifierAlignmentFixer.cpp Show resolved Hide resolved
EXPECT_TOKEN(Tokens[10], tok::r_paren, TT_CastRParen);
EXPECT_TOKEN(Tokens[11], tok::amp, TT_UnaryOperator);

Tokens = annotate("#define FOO(bar) foo((time_t) & bar)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't have time_t in the list did you? does that mean we are just checking this one uint64_t type above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally excluded time_t, but on second thought I'll add it, along with clock_t.

@zygoloid
Copy link
Collaborator

zygoloid commented Mar 5, 2024

+1 for this, it will be the 95%/5% rule, you might think all cases of (identifier) is a cast, but I'm pretty sure it won't be

Specifically, what I suggested was: when disambiguating whether (identifier)&... or (identifier)*... is a cast vs a multiplication or bitwise and (that is, the case that the previous patch changed, resulting in the regression we're trying to fix), treat it as a cast unless identifier is a macro parameter name. I think that will be right >99% of the time. And even when clang-format gets it wrong, the programmer can fix that by removing the redundant and non-idiomatic parentheses around the identifier.

I'm with @owenca on this, this patch will move us closer, meaning users don't need to add common types to TypeNames

As I said before, I think special-casing these typedefs is a good thing - it's reasonable for clang-format to assume they're type names. But I don't think this moves us much closer to fixing the bug, because most casts won't be to one of these typedefs.

I do wonder, though -- there's a lot of churn in this patch adding and wiring through a new parameter. Can the TypeNames mechanism be prepopulated with this list of types instead?

@mydeveloperday
Copy link
Contributor

Just thinking out loud and maybe not for this patch but for a more general solution, what we seem to never do is collect information about types local to the file as we go; For example what if a first pass identified types from declarations and function argument declarations, return types, template arguments and added them to a structure that would have some probability that a given identifier was actually a type.

Foo x;
or Bar(Foo x)

This would give "Foo" some sort of weight to it being a type, which then would give more promenance to it being a cast than a binary operation.

The problem I see and why I think I agree more with @owenca is people love to wrap logical operations in '()' so I think assuming (a) is a cast is hard especailly for the (a) * (b) cases, but if we had already see a as a type we'd at least have some more confidence that a (a) was a cast.

Given the example below the macro argument doesn't tell me its a type, but the <m_type> does as does the return type of get()

By the time we'd arrived at the (m_type)* we'd already have a good idea that this m_type was a type and not a variable and so by definition we'd say it was a cast

 #define VARIANT_ACCESSOR_NUMBER(m_type)                                                                        \
 	template <>                                                                                                \
 	struct VariantInternalAccessor<m_type> {                                                                   \
-		static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type)*VariantInternal::get_int(v); }    \
+		static _FORCE_INLINE_ m_type get(const Variant *v) { return (m_type) * VariantInternal::get_int(v); }  \
 		static _FORCE_INLINE_ void set(Variant *v, m_type p_value) { *VariantInternal::get_int(v) = p_value; } \
 	};

There is some precident for this in using the "majority" rule of something local to the file (from Jaspers original slides)

More important problems
int *a; or int* a;
● Clang-format has an adaptive mode:
○ Count cases in input
○ Take majority vote

Any thoughts?

@owenca
Copy link
Contributor Author

owenca commented Mar 9, 2024

Can the TypeNames mechanism be prepopulated with this list of types instead?

I don't see how that would work. Please note that the TypeNames option is not specific to C/C++/Objective-C. Otherwise, I could simply change isTypeName(bool IsCpp) to isCppTypeName().

@owenca
Copy link
Contributor Author

owenca commented Mar 9, 2024

Just thinking out loud and maybe not for this patch but for a more general solution, what we seem to never do is collect information about types local to the file as we go; For example what if a first pass identified types from declarations and function argument declarations, return types, template arguments and added them to a structure that would have some probability that a given identifier was actually a type.

I'm kind of skeptical because we can't even correctly annotate the declarators all the time, but someone must actually try it to find out.

@owenca owenca merged commit 0baef3b into llvm:main Mar 9, 2024
4 checks passed
@owenca owenca deleted the 83400 branch March 9, 2024 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang-format: Wrong spacing in macro with C-style cast and address-of operator
4 participants