Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo #46

Closed
wants to merge 6 commits into from
Closed

Typo #46

Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 12 additions & 27 deletions decepticonlp/transforms/perturbations.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,23 +203,12 @@ def apply(self, word: str, **kwargs):

assert " " not in word, self.get_string_not_a_word_error_msg()

# convert word to list (string is immutable)
word = list(word)

num_chars_to_shift = math.ceil(len(word) * kwargs.get("probability", 0.1))

# checking for capitalizations
capitalization = [False] * len(word)

# convert to lowercase and record capitalization
for i in range(len(word)):
capitalization[i] = word[i].isupper()
word[i] = word[i].lower()
chars = len(word)
num_chars_to_shift = math.ceil(chars * kwargs.get("probability", 0.1))

# list of characters to be switched
positions_to_shift = []
for i in range(num_chars_to_shift):
positions_to_shift.append(random.randint(0, len(word) - 1))
positions_to_shift = random.sample(range(chars), num_chars_to_shift)

# defining a dictionary of keys located close to each character
keys_in_proximity = {
Expand Down Expand Up @@ -251,23 +240,19 @@ def apply(self, word: str, **kwargs):
"z": ["a", "s", "x"],
}

# insert typo
for pos in positions_to_shift:
# no typo insertion for special characters
try:
typo_list = keys_in_proximity[word[pos]]
word[pos] = random.choice(typo_list)
except:
break
for i, c in enumerate(word):
# Check Upper
cap = c.isupper()

# reinsert capitalization
for i in range(len(word)):
if capitalization[i]:
word[i] = word[i].upper()
# Check if in position and given keys
if i in positions_to_shift and c in keys_in_proximity:
word[i] = random.choice(keys_in_proximity[c])
if cap:
# convert to upper if in upper
word[i] = word[i].upper()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test case to cover this also.
That should resolve the coverage failure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case is already there, does code coverage count the number of test cases for logic ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler resolve would be to extend the dictionary to capital letters.

Pros - Defining non alphabetical typos as well, the current implementation and this PR to assume that typos consider letters only, however, I can also put a number in between. So we can define different typos for capital letters

Cons - Will clutter the code.

Should I save this dictionary as a son rather ? This way in future if we wish to add any changes we don't have to mess with the code


# recombine
word = "".join(word)

return word


Expand Down