-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the O(N^2) bug of passColName and passRowName #1782
Conversation
@jajhall CI tests pass. It is ready to be reviewed. |
assert(int(search->second) < int(this->name2index.size())); | ||
this->name2index.erase(search); | ||
this->name2index.insert({name[index], kHashIsDuplicate}); | ||
search->second = kHashIsDuplicate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By deleting
this->name2index.erase(search);
this->name2index.insert({name[index], kHashIsDuplicate});
and only setting
search->second = kHashIsDuplicate;
(which has no effect since search is immediately destroyed) the property of marking duplicate names is lost. hence errors due to the same name being given to different columns are not trapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emplace
will only insert when there is no duplication. When there is duplication, search
points to the duplicated key and value, and setting the value will influence the original map.
This is an example: https://godbolt.org/z/YsYYrKrPb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it did cross my mind that there was more to setting
search->second = kHashIsDuplicate;
than I gave you credit for, sorry.
As I said, I'm not fluent in the use of std::unordered_map
:-)
Just spotted the &
in
auto& search = emplace_result.first;
Now I see why setting
search->second = kHashIsDuplicate;
achieves the desired result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. I also learned about the meaning of return values of emplace
today. It is a clever choice to use emplace
to detect duplication.
// Find the original and mark it as duplicate | ||
auto& search = emplace_result.first; | ||
assert(int(search->second) < int(this->name2index.size())); | ||
search->second = kHashIsDuplicate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may pass the CI tests, but that's because there's no unit test to check that duplicate names are marked as such
Lines 68 to 88 in 13363c9
In the beginning, col0_name is unique, so it has corresponding column. But after col0_name is assigned to |
There is still one edge case: if a name is marked as duplicate and the corresponding column is assigned with a new name, then the old name should not be deleted directly (there might be other columns with this old name). However, this is an edge case and checking it would require The final and complete solution is to use https://en.cppreference.com/w/cpp/container/unordered_multimap to map name to column/row if we want to allow multiple columns/rows to have the same name. |
Indeed, there's a limit to making this idiot-proof |
Reland #1780
I also simplify some logic here. The result of
emplace
has contained the duplicate key-value pair if it detects duplication, so we can mark the value as duplicate directly instead of using the slow search-erase-insert combo.