Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address all remaining clang-tidy errors #16956

Merged
merged 17 commits into from
Oct 7, 2024

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Sep 30, 2024

Description

With this set of changes I get a clean run of clang-tidy (with one caveat that I'll explain in the follow-up PR to add clang-tidy to pre-commit/CI).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 30, 2024
@vyasr vyasr self-assigned this Sep 30, 2024
@vyasr vyasr requested review from a team as code owners September 30, 2024 20:41
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Sep 30, 2024
@vyasr vyasr requested a review from davidwendt October 1, 2024 00:01
Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't like the idea of removing the name class when constructing an object. Having the name class makes thing looks more obvious and less error prone. In particular, instead of having this:

class_name func() {
  // ... very long code
  return {a, b, c,...};
}

I prefer to have this:

class_name func() {
  // ... very long code
  return class_name{a, b, c,...};
}

@davidwendt
Copy link
Contributor

Personally I don't like the idea of removing the name class when constructing an object. Having the name class makes thing looks more obvious and less error prone. In particular, instead of having this:

class_name func() {
  // ... very long code
  return {a, b, c,...};
}

I prefer to have this:

class_name func() {
  // ... very long code
  return class_name{a, b, c,...};
}

I agree with this as well but will abide by a majority vote.

@@ -228,7 +228,8 @@ class parquet_field_string : public parquet_field {
* @return True if field types mismatch or if the process of reading a
* string fails
*/
struct parquet_field_string_list : public parquet_field_list<std::string, FieldType::BINARY> {
class parquet_field_string_list : public parquet_field_list<std::string, FieldType::BINARY> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELI5: Why was this change required? Isn't the old version a valid equivalent?

Is it because of (the equivalent of) the Google Style Guide stating that structs need to be POD/passive objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was necessary because the header file declares this object as a class instead of a struct. We could equivalently have changed the header to use struct instead, but the two should be aligned because under certain compilers (Windows) the results could be ABI-incompatible. See https://clang.llvm.org/docs/DiagnosticsReference.html#wmismatched-tags

Comment on lines +560 to +564
[&lhs_data = lhs_data,
&lhs_mask = lhs_mask,
&rhs_data = rhs_data,
&rhs_mask = rhs_mask,
&result_mask = result_mask](auto i) -> TypeOut {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I've understood the need for this either. Why is repeating the variable name here preferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure binding cannot be captured automatically so we have to use this trick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure binding cannot be captured automatically...

In which case then, how was it working before? The definitions of lhs_data etc. haven't changed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were getting lucky because gcc implements this as an extension even when only compiling under C++17, but other compilers will not be so friendly: https://godbolt.org/z/9fnq6Ex84.

@@ -39,19 +39,19 @@ using cudf::device_span;
*/
template <typename Decompressor>
struct DecompressTest : public cudf::test::BaseFixture {
std::vector<uint8_t> vector_from_string(char const* str) const
[[nodiscard]] std::vector<uint8_t> vector_from_string(std::string const str) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why not std::string const& str? It's not like str is being std::move()-ed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I can make this change.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits and questions. But this LGTM.

@mythrocks
Copy link
Contributor

Personally I don't like the idea of removing the name class when constructing an object.

I agree with this as well.

To me it's a little bit like using auto to define variables for objects returned from a function, whose type isn't immediately evident at the call-site. E.g. std::size_t vs cudf::size_type.

I think I'd prefer the explicit type for clarity, but I'll follow what the team decides.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 3, 2024

Personally I don't like the idea of removing the name class when constructing an object.

I agree with this as well.

To me it's a little bit like using auto to define variables for objects returned from a function, whose type isn't immediately evident at the call-site. E.g. std::size_t vs cudf::size_type.

I think I'd prefer the explicit type for clarity, but I'll follow what the team decides.

I agree with the auto analogy. Even more than Mithun's example, this is the exact mirror of using an auto return type from a function; in that case the signature doesn't have the type because it is inferred from the code, whereas in the case we're discussing here the type is only in the signature and the code grabs it from there. I suspect this preference will also have a lot to do with what other languages a given developer is familiar with.

Personally I like the change, but I don't feel strongly enough to sway the team if other generally prefer the status quo. It seems like the vote is currently 3-1 so I'll go ahead and revert. I'll need to go through and manually revert the changes after removing the rule since these changes are not autofixes, so I'll come back to this tomorrow.

@davidwendt
Copy link
Contributor

Personally I don't like the idea of removing the name class when constructing an object.

I agree with this as well.

To me it's a little bit like using auto to define variables for objects returned from a function, whose type isn't immediately evident at the call-site. E.g. std::size_t vs cudf::size_type.
I think I'd prefer the explicit type for clarity, but I'll follow what the team decides.

I agree with the auto analogy. Even more than Mithun's example, this is the exact mirror of using an auto return type from a function; in that case the signature doesn't have the type because it is inferred from the code, whereas in the case we're discussing here the type is only in the signature and the code grabs it from there. I suspect this preference will also have a lot to do with what other languages a given developer is familiar with.

Personally I like the change, but I don't feel strongly enough to sway the team if other generally prefer the status quo. It seems like the vote is currently 3-1 so I'll go ahead and revert. I'll need to go through and manually revert the changes after removing the rule since these changes are not autofixes, so I'll come back to this tomorrow.

@bdice @shrshi @vuule @PointKernel @karthikeyann @mhaseeb123 @kingcrimsontianyu @lamarr

@vuule
Copy link
Contributor

vuule commented Oct 3, 2024

I like using auto in almost every context. It prevents accidental conversions and my IDE can tell me what type is actually used. I've had to deal with multiple bugs caused by conversions on return, but I don't think I've seen any bugs caused by auto use.

I do understand the annoyance with long functions that return "auto", so I'm not really pushing for such auto use. Fine either way, just wanted to highlight what I see as a benefit.

@PointKernel
Copy link
Member

I prefer the explicit return type for better clarity and readability as well.

Also, given that many of our unary constructors are not marked as explicit but should be, using explicit return types is required in such cases.

@bdice
Copy link
Contributor

bdice commented Oct 3, 2024

I prefer return class_name{a, b, c,...}; over return {a, b, c, …};. However, I’ve gotten used to reading this form without the class name and I’m OK either way.

I am fine with using auto in a function’s declared return type like auto function_name(){}, especially when paired with the explicit class name at the return statement which might have some friendly CTAD simplification.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 4, 2024

I think that is enough votes that I will go ahead and revert this change and remove the relevant clang-tidy rule.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 4, 2024

I have reverted the change. @davidwendt could you take another look and see if there's anything else that stands out? This PR has two approvals but given the discussion and subsequent changes I'd like another pair of eyes on it.

@ttnghia
Copy link
Contributor

ttnghia commented Oct 4, 2024

With the config removed, can we use braces instead of parenthesis in the constructor call? I.e., instead of class_name(...) we use class_name{...}?

@vyasr
Copy link
Contributor Author

vyasr commented Oct 4, 2024

We can, but we have to do it manually. You can't have clang-tidy do that automatically for you. As such, I would prefer not to make such changes in this PR because they're a stylistic improvement that's out of scope of what clang-tidy can do.

Copy link
Contributor

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved CMake changes

@vyasr
Copy link
Contributor Author

vyasr commented Oct 7, 2024

/merge

@rapids-bot rapids-bot bot merged commit 7e1e475 into rapidsai:branch-24.12 Oct 7, 2024
100 checks passed
@vyasr vyasr deleted the feat/clang_tidy_5 branch October 7, 2024 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants