Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow easy definition of sanitized substitutions #264

Closed
jlstevens opened this issue Sep 10, 2015 · 15 comments
Closed

Allow easy definition of sanitized substitutions #264

jlstevens opened this issue Sep 10, 2015 · 15 comments
Assignees
Labels
type: feature A major new feature
Milestone

Comments

@jlstevens
Copy link
Contributor

For the next release, we should make sure it is easy for the user to define convenient substitutions for the sanitized strings, particularly dimension names (which can get very unwieldy, especially when LateX is involved)

One suggestion is to allow definition of an alias at the dimension level e.g by specifying the name as a tuple. Alternatively, we should extend the formatter system to handle custom string sanitation and the idea of a tuple for a name could just be a shortcut to defining the appropriate formatter.

@jlstevens jlstevens added the type: feature A major new feature label Sep 10, 2015
@philippjfr philippjfr added this to the v1.4.0 milestone Sep 11, 2015
@jlstevens
Copy link
Contributor Author

I feel I should link to Issue #251 as that issue also deals with dimension formatters. I say we need to think careful about improving dimension formatters in general.

@philippjfr
Copy link
Member

They're different formatters though, one is for the values the other is for the name. Apart from making sure the naming makes it clear which is which there isn't much of a link between the two.

@jlstevens
Copy link
Contributor Author

The link is that we want a consistent system for defining multiple types of formatters. In particular, it should be easy to define these formatters, we want a consistent way of defining such formatters, and finally we need to worry about pickling issues (e.g if a lamda function is used).

For this reason, I think issue #251 is related. There is also the issue of formatting units to consider.

@philippjfr philippjfr mentioned this issue Nov 10, 2015
42 tasks
@philippjfr
Copy link
Member

Can we come to a decision here? Currently we have two formatter parameters on Dimensions:

  • formatter - Defines the formatting of the dimension values and ticks
  • format_string - Defines the display of the quantity along with the dimension, e.g. {name}: {val} {unit} becomes Weight: 10 kg

Adding one additional formatter for the name itself along with some easy way to define this as an alias seems like the straightforward and sensible option here. By default this formatter would probably just be {name} so it mirrors the defined dimension name but can be overridden with any valid formatting string. I would suggest calling the parameter label_formatter and then allowing both the name and the alias/formatter to be defined as a tuple in the Dimension constructor, e.g. Dimension(('phi', '$\{name}$')) would define the name phi for easy access and the label_formatter $\{name}$, which would render the label nicely in LaTeX.

I also don't think we need to concern ourselves with dimension formatting any further for now. I know @jlstevens at least is concerned about proliferation of parameters on Dimensions but I think at least formatter and the new label_formatter are very well defined and clear, if we had those two I'd be happy to rename/rethink any other parameters controlling the labeling, such as the aforementioned format_string.

@jbednar
Copy link
Member

jbednar commented Nov 17, 2015

Sounds fine to me. Renaming format_string and/or formatter does seem appropriate, since there is no clue in their names how they are different from each other. label_formatter seems fine.

@jlstevens
Copy link
Contributor Author

I agree this needs to be addressed for this release (i.e now). The old names were terrible so I am looking forward to something more meaningful! Here is what I understand of the proposal above:

  • format_string will be renamed to title_format and will no longer be on Dimension objects. I am happy to let this be a global variable in dimension.py. This technically breaks backwards compatibility but neither Philipp or I think anyone will have modified it.
  • formatter will be renamed to value_format and we will add backwards compatibility (with a deprecation warning) for the old version.
  • label_format will handle axis labels which was previously done by the name. This new parameter allows easier use of LaTeX and can be set directly or using the convenient tuple format shown above. As Philipp points out, setting something like label_formatter='$\{name}$' at the class level would help anyone who uses greek letter names often. The default value for label_formatter would be '{name}' to preserve the old default behavior.
  • The dimension name should now be simple and attribute friendly.

Ok, I am happy with this! I dislike new parameters but I dislike parameters with meaningless names even more (my main gripe from before). The new names are much clearer, and moving title_format off the dimension object and adding label_format does seem to offer something more useful than before.

@jbednar
Copy link
Member

jbednar commented Nov 17, 2015

Sounds like a clean proposal.

I guess that adding a lookup table containing all the Greek letters (upper and lower case) and translating anything found in the table to their LaTeX equivalents is too tricksy? That wouldn't cover other common cases like letters with super or subscripts, of course.

As it is, the label_formatter will have to be set for each and every dimension that's a Greek letter, which seems painful (though better than before). I'm not actually arguing for the pre-populated global lookup table, just making sure you've considered it already and rejected it as making too many assumptions.

@jlstevens
Copy link
Contributor Author

@jbednar Your suggestion here is along the lines something I've just been discussing with Philipp. I still like most of the proposal above but maybe label_formatter is unnecessary.

My objection is that this avoids sanitization of dimension names, but doesn't let the user define aliases for groups and labels. Here is a new suggestion to consider:

hv.substitutions(dimension={'\$phi$':'phi', '$\alpha$':'alpha'},
                 group={'controls':'Experimental Controls', 
                        'group1':'My Group with $pec!al characters'},
                 label = {'Red':'Red Channel'})

The idea is that sanitization is applied if the appropriate substitution is not defined. Note that there are 'namespaces' here (unlike the current sanitization substitution system) and that it should be easy to use by making it available at the hv level. Finally, this would make Jim's proposal possible if the user wants to set up the lookup table of Greek letters.

Again, this is just an idea to consider. As always, it is best to come up with a general solution if possible!

Edit: Probably no need to make it a function when a simple dictionary could work. Something like this could suffice:

hv.substitutions.dimension= dict(phi='\$phi$', alpha='$\alpha$')
hv.substitutions.group=dict(controls='Experimental Controls', 
                            group1='My Group with $pec!al characters')
hv.substitutions.label = dict(Red='Red Channel')

@philippjfr
Copy link
Member

I'd be okay with this but it's another system for the user to know about, but I do think it might be a good idea. There's is an inconsistency in your example though, which highlights an issue with the proposal. The group/label mappings go form short-form to long-form, while the Dimension does the opposite. So I'm somewhat unclear in which direction you think the substitutions should go. Should the user always define groups and labels that are already valid identifiers and the substitutions define the long-form used for titles (in which case group1 would be rejected)? Similarly does the user define the long-form or short-form as the Dimension name?

@jlstevens
Copy link
Contributor Author

I think the example I've added in the edit makes the most sense to me. Here it is again:

hv.substitutions.dimension= dict(phi='\$phi$', alpha='$\alpha$')
hv.substitutions.group=dict(controls='Experimental Controls', 
                            group1='My Group with $pec!al characters')
hv.substitutions.label = dict(Red='Red Channel')

This way round helps make sure the sanitized input is valid (python does that for you). Although it is a new system for users to know about, users would have to learn about label_format as well. In addition, this approach would be more powerful as it could help customize sanitization everywhere.

We already have two instances of sanitize_identifier_fn:

sanitize_identifier = sanitize_identifier_fn.instance()
dimension_sanitizer = sanitize_identifier_fn.instance(capitalize=False)

The new suggestion would hook into these instances to define the substitutions. Not sure if we want to keep group/label substitutions together or introduce label_sanitizer and group_sanitizer separately.

My other concern is about making this substitution list effectively global. That comes with pros and cons too...

@jlstevens
Copy link
Contributor Author

A third alternative (that I might like the most) is to extend the same short tuple proposal for Dimension to group and label. Then it is convenient, consistent, distributed and simple!

If you wanted:

hv.Curve(label=('label1','My Long Label'), group=('grp', 'Money in $'),
               kdims=[hv.Dimension(('dollars', '$'))])

Bit awkward but you can use tuples as 'objects' for these aliases:

label1 = ('label1','My Long Label')
money = ('money', 'Money in % profits')
dollars = ('dollars', '$')

Then the definition would be:

hv.Curve(label=label1, group=money, kdims=[hv.Dimension(dollars)])

I think this might be a simpler and more consistent approach. What do you think?

@jlstevens
Copy link
Contributor Author

My example above bothers me as I wouldn't want to clutter the namespace with random names. Just to tidy things up (the fundamental suggestion of working with tuples is unchanged) you can imagine a simple hv.aliases helper function:

labels = hv.aliases(my_label='Some Label', my_other_label='Some Other Label')
groups = hv.aliases(money = '$$')
hv.Curve(label=labels.my_label, group=groups.money)

This would give you 1. tab completion (convenient!) 2. a cleaner namespace 3. and would be simply a shortcut for supplying the tuples (and therefore entirely optional).

I think I really like this proposal. What are your thoughts on this?

@jbednar
Copy link
Member

jbednar commented Nov 17, 2015

Seems good to me.

@philippjfr philippjfr self-assigned this Nov 23, 2015
@jlstevens
Copy link
Contributor Author

As I've dealt with the other issues assigned to me for the 1.4 release, I'll tackle this now. Philipp is currently working on an important PR and has a couple of issues outstanding (most of which I think will be easy fixes).

@jlstevens
Copy link
Contributor Author

As the PR is now merged, I can finally close this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A major new feature
Projects
None yet
Development

No branches or pull requests

3 participants