Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"LGTM" tool warns about regexp for "remove_internal_links" function (for PDF build) #852

Closed
DimitriPapadopoulos opened this issue Aug 18, 2021 · 1 comment · Fixed by #915

Comments

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 18, 2021

Here are the two LGTM warnings:

The relevant lines of code:

        primary_pattern = re.compile(r'\[((?!http).[\w\s.\(\)`*/–]+)\]\(((?!http).+(\.md|\.yml|\.md#[\w\-\w]+))\)')  # noqa: E501
        primary_pattern = re.compile(r'\[([\w\s.\(\)`*/–]+)\]\(([#\w\-._\w]+)\)')

Indeed, there are two \w within classes of characters delimited by [] and not groups of characters delimited by ():

  • [\w\-\w]
  • [#\w\-._\w]

See Duplication in regular expression character class:

Character classes in regular expressions represent sets of characters, so there is no need to specify the same character twice in one character class. Duplicate characters in character classes are at best useless, and may even indicate a latent bug.

Recommendation

Determine whether a character is simply duplicated or whether the character class was in fact meant as a group. If it is just a duplicate, then remove the duplicate character. If was supposed to be a group, then replace the square brackets with parentheses.

@sappelhoff sappelhoff changed the title LGTM warning "LGTM" tool warns about regexp for "remove_internal_links" function (for PDF build) Aug 23, 2021
@sappelhoff
Copy link
Member

as mentioned in #855 (review), this should probably be fixed as part of #596

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants