-
-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show a "Did you mean:" suggestion for thrown InvalidAttributeErrors #4394
base: master
Are you sure you want to change the base?
Conversation
I think I'd not print |
You mean like " I can try that. I don't know how that will look for suggestions that aren't just missing typos but include transposed letters for example. I can probably just highlight the difference between the longest common subsequence. I put it as a list since not all attributes will have suggestions necessarily, so e.g. the example above might not match the order of the passed attributes if it was |
I thought these algorithms that find similar options go by edit distance, so I would mark every edit (except deletions) if possible. |
Cool :) I guess I would have expected both a and o in ssao to be colored but that's minor |
Yeah I'm not sure why that substitution isn't highlighted.. maybe it's counted as a deletion (I don't show the deletions since it makes it harder to read, only additions and substitutions) |
Cool stuff! :) I wanted this very early on for Makie but never got to it! |
Neat! This would be quite helpful to have 🙂 Having implemented something similar in DataToolkit, I would recommend going for Optimal String Alignment (aka restricted Damerau–Levenshtein distance) to identify the most similar matches* (i.e. what to suggest), and then doing highlighting based on the longest common subsequence or edits. *I think this is better than "plain" Levenshtein particularly when targeting typos, since it supports a single "transposition" operation meaning that "teh" and "the" only have a distance of 1, not 2. The fuzzyscore update that Jakob PR'd to REPL a while ago uses this, and I've since implemented a variant of the algorithm in in DataToolkit that discounts case-change substitutions to half cost (again with the aim of better catching typos). As an aside, this sort of functionality keeps on popping up as a "nice to have". Depending on how I go with my plans for nice error messages, that may provide a central implementation so packages don't need to reimplement this stuff, but I imagine packages like Makie won't be able to use that till whatever future version of Julia that is becomes an LTS, so this is really just a speculative optimistic aside :) |
The highlighting is already done using edits, so are you suggesting to just change the method used for finding the best suggestion? If you can link to a script that implements what you are suggesting, I can incorporate it into here. |
Yup, for display of candidates showing insertions(/deletions) is solid, I just thought I'd mention LCS as well. The main suggestion I have is using OSA/RDL over plain Levenshtein. Here's where I implement the case-adjusted version I reference: https://github.com/tecosaur/DataToolkit.jl/blob/main/Core/src/model/utils.jl#L125 |
So, is the suggestion to do something like function _find_candidate(search::String, candidates::Vector{String})
scores = map(candidates) do cand
_stringsimilarity(search, cand; halfcase=true)
end
candidates = candidates[sortperm(scores)] # or just do findmax
valid = candidates[1] < ??? #
return candidates[1], valid # Only return one suggestion per search
end
function find_nearby_attributes(attributes, candidates)
d = Vector{Tuple{String, Bool}}(undef, length(attributes))
any_close = false
for (i, attr) in enumerate(attributes)
candidate, valid = _find_candidate(String(attr), candidates)
any_close = any_close || valid
d[i] = (candidate, valid)
end
return d, any_close
end and retain |
In the prototype detailed error messages library I'm working on I've put more into this than you probably want to here and used a dynamic similarity threshold based on the length of the reference string, the number of alternatives, and the distribution of the similarity scores of the alternatives. I'm pretty happy with how all that works together, but I suspect it's more than you'd want here. In DataToolkit I'm currently just applying a threshold of |
Thanks! I will try and apply it over the weekend. |
I haven't forgotten about this but I don't really have the energy to re-implement the algorithms as suggested by @tecosaur. I'm satisfied with this as-is but otherwise someone else should feel free to finish this last step. |
That's pretty understandable, this just happens to be an area where I've put in particular effort. FWIW, the code I linked should be pretty copy+paste friendly, if anybody is interested in giving it a look. |
Yeah, I do agree that it would be a good addition to the PR. Just been busy and wanted to give a heads up that I probably won't be the one to finish the job here instead of leaving it completely abandoned 😅 |
Should we just merge this then, since it's a great improvement? |
Was about to say the same. We can always improve such things in patch releases |
I guess the failure is due to #4587 updating refimages |
Description
This PR adds a bit more to the errors thrown from
InvalidAttributeError
, now also showing nearby attributes to those that the user provides. I think this could be useful since, for some plots, there are so many attributes to look through.Suggestions are only shown for close attributes, and not for ones that are too far from any other attribute. The mechanism for this is copied from the method used in the REPL for providing suggestions (I copied the functions over directly since none of that stuff is public API).
Some examples:
Here's an image to also show the styling.
Type of change
Checklist