Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callable missing type argument, making some methods partially-Unknown #60

Closed
Avasam opened this issue Feb 11, 2024 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@Avasam
Copy link

Avasam commented Feb 11, 2024

For example
image

If it can truly be "anything", you can use Callable[..., Any], but I assume it's meant to be Callable[[Sequence[Hashable]], Sequence[Hashable]]

While at it, Callable, Hashable and Sequence should all be imported from collections.abc.
List and Tuple also don't need to be imported from typing in a type stub, the stdlib names (list and tuple) can be used directly.
Union and Optional can use PEP 604 syntax

@maxbachmann
Copy link
Member

I somewhat improved the type hint in https://github.com/rapidfuzz/Levenshtein/blob/main/src/Levenshtein/__init__.pyi to

processor: Callable[..., Sequence[Hashable]] | None = None,

The problem with Callable[[Sequence[Hashable]], Sequence[Hashable]] is that it actually enforces the function to be more generic than it has to be. So e.g. the following wouldn't be allowed anymore:

def func(a: str) -> str:
    return a

Levenshtein.ratio("test", "test2", processor=func)

The requirement would be that processor can be called using both s1 and s2. Not sure how I can express that using type hints though. I know I tried a while back and failed 😅

@Avasam
Copy link
Author

Avasam commented Feb 11, 2024

@maxbachmann ... allows any number of arguments of any type (which the current type hints already do), but won't lead to the partially unknown return type I mentioned above
image

d64d503 (and follow-up fixes) should close this issue :)

@maxbachmann
Copy link
Member

btw your implementation in Toufool/AutoSplit#272 could be made significantly faster if you have to compare larger amounts of texts:

# Levenshtein is just a wrapper for rapidfuzz at this point and rapidfuzz provides quite a few more features + is MIT instead of GPL licensed
from rapidfuzz import process
# Levenshtein.ratio is a normalized Indel similarity (levenshtein with subsitution weight 2).
# Was this intended? Otherwise you should use Levenshtein
from rapidfuzz.distance import Indel


process.extractOne(image_string, texts, scorer=Indel.normalized_similarity)

If you only have a few texts, this doesn't really matter though

@maxbachmann maxbachmann added the bug Something isn't working label Feb 11, 2024
@Avasam
Copy link
Author

Avasam commented Feb 11, 2024

Unfortunately that area of code is heavily bottleneck by optical text recognition, at around 1 cycle every 0.25s in the best conditions so far.

But I'll definitely take a look at, and consider using directly, rapidfuzz. Especially if I'm just given the code like that ^^
Their partial ratio might actually be of interest for us. CC @ston1th as you might be interested in this (but let's keep our discussion to your PR as this is off-topic for this issue about type annotations)

@maxbachmann
Copy link
Member

Unfortunately that area of code is heavily bottleneck by optical text recognition, at around 1 cycle every 0.25s in the best conditions so far.

Yes then it probably will not make much of a difference. It's going to be a lot faster than that anyway.

@maxbachmann
Copy link
Member

I published v0.25.0 which includes the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants