Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show a "Did you mean:" suggestion for thrown InvalidAttributeErrors #4394

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

DanielVandH
Copy link
Contributor

@DanielVandH DanielVandH commented Sep 21, 2024

Description

This PR adds a bit more to the errors thrown from InvalidAttributeError, now also showing nearby attributes to those that the user provides. I think this could be useful since, for some plots, there are so many attributes to look through.

Suggestions are only shown for close attributes, and not for ones that are too far from any other attribute. The mechanism for this is copied from the method used in the REPL for providing suggestions (I copied the functions over directly since none of that stuff is public API).

Some examples:

julia> lines(1:10, linestl = 1)
ERROR: Invalid attribute linestl for plot type Lines{Tuple{Vector{Point{2, Float64}}}}.

Did you mean linestyle?

The available plot attributes for Lines{Tuple{Vector{Point{2, Float64}}}} are:

alpha        color     colorrange  cycle        fxaa      inspectable      inspector_hover  joinstyle  linestyle  lowclip      model      overdraw  ssao            transparency
clip_planes  colormap  colorscale  depth_shift  highclip  inspector_clear  inspector_label  linecap    linewidth  miter_limit  nan_color  space     transformation  visible

julia> triplot(rand(2, 50), ghost_edge_extension_factr = 1, markerclr = 2, blahblahblahblah = 10) # don't show suggestions for distant kwargs. This example is a bit weird since it prints the whole Plot type with parameters (not because of this PR, also on master)
ERROR: Invalid attributes markerclr, ghost_edge_extension_factr and blahblahblahblah for plot type Plot{Makie.triplot, Tuple{DelaunayTriangulation.Triangulation{Vector{Point{2, Float64}}, Set{Tuple{Int64, Int64, Int64}}, Vector{Int64}, DelaunayTriangulation.ZeroWeight{Float64}, Int64, Tuple{Int64, Int64}, Set{Tuple{Int64, Int64}}, Tuple{}, Dict{Tuple{Int64, Int64}, Tuple{Vector{Int64}, Int64}}, Dict{Int64, Vector{Int64}}, Dict{Int64, UnitRange{Int64}}, Dict{Int64, DelaunayTriangulation.RepresentativeCoordinates{Int64, Float64}}, DelaunayTriangulation.TriangulationCache{DelaunayTriangulation.Triangulation{Vector{Point{2, Float64}}, Set{Tuple{Int64, Int64, Int64}}, Vector{Int64}, DelaunayTriangulation.ZeroWeight{Float64}, Int64, Tuple{Int64, Int64}, Set{Tuple{Int64, Int64}}, Tuple{}, Dict{Tuple{Int64, Int64}, Tuple{Vector{Int64}, Int64}}, Dict{Int64, Vector{Int64}}, Dict{Int64, UnitRange{Int64}}, Dict{Int64, DelaunayTriangulation.RepresentativeCoordinates{Int64, Float64}}, DelaunayTriangulation.TriangulationCache{Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}, Vector{Int64}, Set{Tuple{Int64, Int64}}, Vector{Int64}, Set{Tuple{Int64, Int64, Int64}}}, Nothing}}}.

Did you mean:
  - markerclr -> markercolor?
  - ghost_edge_extension_factr -> ghost_edge_extension_factor?


The available plot attributes for Plot{Makie.triplot, Tuple{DelaunayTriangulation.Triangulation{Vector{Point{2, Float64}}, Set{Tuple{Int64, Int64, Int64}}, Vector{Int64}, DelaunayTriangulation.ZeroWeight{Float64}, Int64, Tuple{Int64, Int64}, Set{Tuple{Int64, Int64}}, Tuple{}, Dict{Tuple{Int64, Int64}, Tuple{Vector{Int64}, Int64}}, Dict{Int64, Vector{Int64}}, Dict{Int64, UnitRange{Int64}}, Dict{Int64, DelaunayTriangulation.RepresentativeCoordinates{Int64, Float64}}, DelaunayTriangulation.TriangulationCache{DelaunayTriangulation.Triangulation{Vector{Point{2, Float64}}, Set{Tuple{Int64, Int64, Int64}}, Vector{Int64}, DelaunayTriangulation.ZeroWeight{Float64}, Int64, Tuple{Int64, Int64}, Set{Tuple{Int64, Int64}}, Tuple{}, Dict{Tuple{Int64, Int64}, Tuple{Vector{Int64}, Int64}}, Dict{Int64, Vector{Int64}}, Dict{Int64, UnitRange{Int64}}, Dict{Int64, DelaunayTriangulation.RepresentativeCoordinates{Int64, Float64}}, DelaunayTriangulation.TriangulationCache{Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}, Vector{Int64}, Set{Tuple{Int64, Int64}}, Vector{Int64}, Set{Tuple{Int64, Int64, Int64}}}, Nothing}}} are:

bounding_box                constrained_edge_linewidth  convex_hull_linewidth        ghost_edge_linestyle  linecap    markercolor  recompute_centers       show_ghost_edges  strokewidth
constrained_edge_color      convex_hull_color           ghost_edge_color             ghost_edge_linewidth  linestyle  markersize   show_constrained_edges  show_points       triangle_color
constrained_edge_linestyle  convex_hull_linestyle       ghost_edge_extension_factor  joinstyle             marker     miter_limit  show_convex_hull        strokecolor


Generic attributes are:

clip_planes  cycle  dim_conversions  label  model  rasterize  transformation  xautolimits  yautolimits  zautolimits

ulia> scatter([0.0], [0.0], colour=2)
ERROR: Invalid attribute colour for plot type Scatter{Tuple{Vector{Point{2, Float64}}}}.

Did you mean color?

The available plot attributes for Scatter{Tuple{Vector{Point{2, Float64}}}} are:

alpha        colormap    cycle         distancefield  glowwidth    inspector_clear  lowclip        markersize   nan_color  space        strokewidth       transparency
clip_planes  colorrange  depth_shift   fxaa           highclip     inspector_hover  marker         markerspace  overdraw   ssao         transform_marker  uv_offset_width
color        colorscale  depthsorting  glowcolor      inspectable  inspector_label  marker_offset  model        rotation   strokecolor  transformation    visible


Generic attributes are:

clip_planes  cycle  dim_conversions  label  model  rasterize  transformation  xautolimits  yautolimits  zautolimits

julia> scatter([0.0], [0.0], blahblablah=2) # nothing is shown when no suggestions are appropriate
ERROR: Invalid attribute blahblablah for plot type Scatter{Tuple{Vector{Point{2, Float64}}}}.

The available plot attributes for Scatter{Tuple{Vector{Point{2, Float64}}}} are:

alpha        colormap    cycle         distancefield  glowwidth    inspector_clear  lowclip        markersize   nan_color  space        strokewidth       transparency
clip_planes  colorrange  depth_shift   fxaa           highclip     inspector_hover  marker         markerspace  overdraw   ssao         transform_marker  uv_offset_width
color        colorscale  depthsorting  glowcolor      inspectable  inspector_label  marker_offset  model        rotation   strokecolor  transformation    visible


Generic attributes are:

clip_planes  cycle  dim_conversions  label  model  rasterize  transformation  xautolimits  yautolimits  zautolimits

Here's an image to also show the styling.

image

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • Added an entry in CHANGELOG.md (for new features and breaking changes)
  • Added unit tests for new algorithms, conversion methods, etc.

@jkrumbiegel
Copy link
Member

I think I'd not print markerclr -> markercolor for these typos but just markercolor with the changed or added letters colored in. That's easier to see if you care about how you misspelled something, and otherwise it's already written above what you wrote.

@DanielVandH
Copy link
Contributor Author

You mean like

"
Did you mean strokewidth and color?
"

I can try that. I don't know how that will look for suggestions that aren't just missing typos but include transposed letters for example. I can probably just highlight the difference between the longest common subsequence.

I put it as a list since not all attributes will have suggestions necessarily, so e.g. the example above might not match the order of the passed attributes if it was (strokewidt, blahblahblah, clr) (it wouldn't show anything for blahblahblah). But if you think it's fine that they don't necessarily match I can try that.

@jkrumbiegel
Copy link
Member

I thought these algorithms that find similar options go by edit distance, so I would mark every edit (except deletions) if possible.

@DanielVandH
Copy link
Contributor Author

image

How about this?

@jkrumbiegel
Copy link
Member

Cool :) I guess I would have expected both a and o in ssao to be colored but that's minor

@DanielVandH
Copy link
Contributor Author

DanielVandH commented Sep 21, 2024

Yeah I'm not sure why that substitution isn't highlighted.. maybe it's counted as a deletion (I don't show the deletions since it makes it harder to read, only additions and substitutions)

@DanielVandH
Copy link
Contributor Author

image

Fixed the ssao issue

@SimonDanisch
Copy link
Member

Cool stuff! :) I wanted this very early on for Makie but never got to it!

@tecosaur
Copy link

Neat! This would be quite helpful to have 🙂

Having implemented something similar in DataToolkit, I would recommend going for Optimal String Alignment (aka restricted Damerau–Levenshtein distance) to identify the most similar matches* (i.e. what to suggest), and then doing highlighting based on the longest common subsequence or edits.

*I think this is better than "plain" Levenshtein particularly when targeting typos, since it supports a single "transposition" operation meaning that "teh" and "the" only have a distance of 1, not 2. The fuzzyscore update that Jakob PR'd to REPL a while ago uses this, and I've since implemented a variant of the algorithm in in DataToolkit that discounts case-change substitutions to half cost (again with the aim of better catching typos).

As an aside, this sort of functionality keeps on popping up as a "nice to have". Depending on how I go with my plans for nice error messages, that may provide a central implementation so packages don't need to reimplement this stuff, but I imagine packages like Makie won't be able to use that till whatever future version of Julia that is becomes an LTS, so this is really just a speculative optimistic aside :)

@DanielVandH
Copy link
Contributor Author

The highlighting is already done using edits, so are you suggesting to just change the method used for finding the best suggestion? If you can link to a script that implements what you are suggesting, I can incorporate it into here.

@tecosaur
Copy link

Yup, for display of candidates showing insertions(/deletions) is solid, I just thought I'd mention LCS as well.

The main suggestion I have is using OSA/RDL over plain Levenshtein. Here's where I implement the case-adjusted version I reference: https://github.com/tecosaur/DataToolkit.jl/blob/main/Core/src/model/utils.jl#L125

@DanielVandH
Copy link
Contributor Author

DanielVandH commented Sep 25, 2024

So, is the suggestion to do something like

function _find_candidate(search::String, candidates::Vector{String})
    scores = map(candidates) do cand 
            _stringsimilarity(search, cand; halfcase=true)
    end
    candidates = candidates[sortperm(scores)] # or just do findmax
    valid = candidates[1] < ??? # 
    return candidates[1], valid # Only return one suggestion per search
end
function find_nearby_attributes(attributes, candidates)
    d = Vector{Tuple{String, Bool}}(undef, length(attributes)) 
    any_close = false
    for (i, attr) in enumerate(attributes)
        candidate, valid = _find_candidate(String(attr), candidates)
        any_close = any_close || valid
        d[i] = (candidate, valid)
    end
    return d, any_close
end

and retain textdiff as is? What's a good threshold to use here valid = candidates[1] < ????

@tecosaur
Copy link

In the prototype detailed error messages library I'm working on I've put more into this than you probably want to here and used a dynamic similarity threshold based on the length of the reference string, the number of alternatives, and the distribution of the similarity scores of the alternatives.

I'm pretty happy with how all that works together, but I suspect it's more than you'd want here. In DataToolkit I'm currently just applying a threshold of 0.5 and that's ...fine. Could be better could be worse.

@DanielVandH
Copy link
Contributor Author

Thanks! I will try and apply it over the weekend.

@DanielVandH
Copy link
Contributor Author

I haven't forgotten about this but I don't really have the energy to re-implement the algorithms as suggested by @tecosaur. I'm satisfied with this as-is but otherwise someone else should feel free to finish this last step.

@tecosaur
Copy link

That's pretty understandable, this just happens to be an area where I've put in particular effort.

FWIW, the code I linked should be pretty copy+paste friendly, if anybody is interested in giving it a look.

@DanielVandH
Copy link
Contributor Author

Yeah, I do agree that it would be a good addition to the PR. Just been busy and wanted to give a heads up that I probably won't be the one to finish the job here instead of leaving it completely abandoned 😅

@SimonDanisch
Copy link
Member

Should we just merge this then, since it's a great improvement?
And once somebody has time, we'll have a new PR?
CC @jkrumbiegel

@jkrumbiegel
Copy link
Member

Was about to say the same. We can always improve such things in patch releases

@SimonDanisch
Copy link
Member

SimonDanisch commented Nov 12, 2024

I guess the failure is due to #4587 updating refimages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

5 participants