Allow easy definition of sanitized substitutions #264

jlstevens · 2015-09-10T16:19:04Z

For the next release, we should make sure it is easy for the user to define convenient substitutions for the sanitized strings, particularly dimension names (which can get very unwieldy, especially when LateX is involved)

One suggestion is to allow definition of an alias at the dimension level e.g by specifying the name as a tuple. Alternatively, we should extend the formatter system to handle custom string sanitation and the idea of a tuple for a name could just be a shortcut to defining the appropriate formatter.

jlstevens · 2015-09-17T13:04:01Z

I feel I should link to Issue #251 as that issue also deals with dimension formatters. I say we need to think careful about improving dimension formatters in general.

philippjfr · 2015-09-17T13:05:54Z

They're different formatters though, one is for the values the other is for the name. Apart from making sure the naming makes it clear which is which there isn't much of a link between the two.

jlstevens · 2015-09-17T13:11:45Z

The link is that we want a consistent system for defining multiple types of formatters. In particular, it should be easy to define these formatters, we want a consistent way of defining such formatters, and finally we need to worry about pickling issues (e.g if a lamda function is used).

For this reason, I think issue #251 is related. There is also the issue of formatting units to consider.

philippjfr · 2015-11-17T13:49:47Z

Can we come to a decision here? Currently we have two formatter parameters on Dimensions:

formatter - Defines the formatting of the dimension values and ticks
format_string - Defines the display of the quantity along with the dimension, e.g. {name}: {val} {unit} becomes Weight: 10 kg

Adding one additional formatter for the name itself along with some easy way to define this as an alias seems like the straightforward and sensible option here. By default this formatter would probably just be {name} so it mirrors the defined dimension name but can be overridden with any valid formatting string. I would suggest calling the parameter label_formatter and then allowing both the name and the alias/formatter to be defined as a tuple in the Dimension constructor, e.g. Dimension(('phi', '$\{name}$')) would define the name phi for easy access and the label_formatter $\{name}$ , which would render the label nicely in LaTeX.

I also don't think we need to concern ourselves with dimension formatting any further for now. I know @jlstevens at least is concerned about proliferation of parameters on Dimensions but I think at least formatter and the new label_formatter are very well defined and clear, if we had those two I'd be happy to rename/rethink any other parameters controlling the labeling, such as the aforementioned format_string.

jbednar · 2015-11-17T14:53:24Z

Sounds fine to me. Renaming format_string and/or formatter does seem appropriate, since there is no clue in their names how they are different from each other. label_formatter seems fine.

jlstevens · 2015-11-17T15:42:00Z

I agree this needs to be addressed for this release (i.e now). The old names were terrible so I am looking forward to something more meaningful! Here is what I understand of the proposal above:

format_string will be renamed to title_format and will no longer be on Dimension objects. I am happy to let this be a global variable in dimension.py. This technically breaks backwards compatibility but neither Philipp or I think anyone will have modified it.
formatter will be renamed to value_format and we will add backwards compatibility (with a deprecation warning) for the old version.
label_format will handle axis labels which was previously done by the name. This new parameter allows easier use of LaTeX and can be set directly or using the convenient tuple format shown above. As Philipp points out, setting something like label_formatter='$\{name}$' at the class level would help anyone who uses greek letter names often. The default value for label_formatter would be '{name}' to preserve the old default behavior.
The dimension name should now be simple and attribute friendly.

Ok, I am happy with this! I dislike new parameters but I dislike parameters with meaningless names even more (my main gripe from before). The new names are much clearer, and moving title_format off the dimension object and adding label_format does seem to offer something more useful than before.

jbednar · 2015-11-17T15:54:55Z

Sounds like a clean proposal.

I guess that adding a lookup table containing all the Greek letters (upper and lower case) and translating anything found in the table to their LaTeX equivalents is too tricksy? That wouldn't cover other common cases like letters with super or subscripts, of course.

As it is, the label_formatter will have to be set for each and every dimension that's a Greek letter, which seems painful (though better than before). I'm not actually arguing for the pre-populated global lookup table, just making sure you've considered it already and rejected it as making too many assumptions.

jlstevens · 2015-11-17T16:08:32Z

@jbednar Your suggestion here is along the lines something I've just been discussing with Philipp. I still like most of the proposal above but maybe label_formatter is unnecessary.

My objection is that this avoids sanitization of dimension names, but doesn't let the user define aliases for groups and labels. Here is a new suggestion to consider:

hv.substitutions(dimension={'\$phi$':'phi', '$\alpha$':'alpha'},
                 group={'controls':'Experimental Controls', 
                        'group1':'My Group with $pec!al characters'},
                 label = {'Red':'Red Channel'})

The idea is that sanitization is applied if the appropriate substitution is not defined. Note that there are 'namespaces' here (unlike the current sanitization substitution system) and that it should be easy to use by making it available at the hv level. Finally, this would make Jim's proposal possible if the user wants to set up the lookup table of Greek letters.

Again, this is just an idea to consider. As always, it is best to come up with a general solution if possible!

Edit: Probably no need to make it a function when a simple dictionary could work. Something like this could suffice:

hv.substitutions.dimension= dict(phi='\$phi$', alpha='$\alpha$')
hv.substitutions.group=dict(controls='Experimental Controls', 
                            group1='My Group with $pec!al characters')
hv.substitutions.label = dict(Red='Red Channel')

philippjfr · 2015-11-17T16:17:41Z

I'd be okay with this but it's another system for the user to know about, but I do think it might be a good idea. There's is an inconsistency in your example though, which highlights an issue with the proposal. The group/label mappings go form short-form to long-form, while the Dimension does the opposite. So I'm somewhat unclear in which direction you think the substitutions should go. Should the user always define groups and labels that are already valid identifiers and the substitutions define the long-form used for titles (in which case group1 would be rejected)? Similarly does the user define the long-form or short-form as the Dimension name?

jlstevens · 2015-11-17T16:23:55Z

I think the example I've added in the edit makes the most sense to me. Here it is again:

hv.substitutions.dimension= dict(phi='\$phi$', alpha='$\alpha$')
hv.substitutions.group=dict(controls='Experimental Controls', 
                            group1='My Group with $pec!al characters')
hv.substitutions.label = dict(Red='Red Channel')

This way round helps make sure the sanitized input is valid (python does that for you). Although it is a new system for users to know about, users would have to learn about label_format as well. In addition, this approach would be more powerful as it could help customize sanitization everywhere.

We already have two instances of sanitize_identifier_fn:

sanitize_identifier = sanitize_identifier_fn.instance()
dimension_sanitizer = sanitize_identifier_fn.instance(capitalize=False)

The new suggestion would hook into these instances to define the substitutions. Not sure if we want to keep group/label substitutions together or introduce label_sanitizer and group_sanitizer separately.

My other concern is about making this substitution list effectively global. That comes with pros and cons too...

jlstevens · 2015-11-17T16:42:35Z

A third alternative (that I might like the most) is to extend the same short tuple proposal for Dimension to group and label. Then it is convenient, consistent, distributed and simple!

If you wanted:

hv.Curve(label=('label1','My Long Label'), group=('grp', 'Money in $'),
               kdims=[hv.Dimension(('dollars', '$'))])

Bit awkward but you can use tuples as 'objects' for these aliases:

label1 = ('label1','My Long Label')
money = ('money', 'Money in % profits')
dollars = ('dollars', '$')

Then the definition would be:

hv.Curve(label=label1, group=money, kdims=[hv.Dimension(dollars)])

I think this might be a simpler and more consistent approach. What do you think?

jlstevens · 2015-11-17T17:03:52Z

My example above bothers me as I wouldn't want to clutter the namespace with random names. Just to tidy things up (the fundamental suggestion of working with tuples is unchanged) you can imagine a simple hv.aliases helper function:

labels = hv.aliases(my_label='Some Label', my_other_label='Some Other Label')
groups = hv.aliases(money = '$$')
hv.Curve(label=labels.my_label, group=groups.money)

This would give you 1. tab completion (convenient!) 2. a cleaner namespace 3. and would be simply a shortcut for supplying the tuples (and therefore entirely optional).

I think I really like this proposal. What are your thoughts on this?

jbednar · 2015-11-17T17:15:29Z

Seems good to me.

jlstevens · 2015-11-30T16:31:43Z

As I've dealt with the other issues assigned to me for the 1.4 release, I'll tackle this now. Philipp is currently working on an important PR and has a couple of issues outstanding (most of which I think will be easy fixes).

jlstevens · 2015-12-02T00:57:39Z

As the PR is now merged, I can finally close this issue!

jlstevens added the type: feature A major new feature label Sep 10, 2015

philippjfr added this to the v1.4.0 milestone Sep 11, 2015

philippjfr mentioned this issue Nov 10, 2015

Data API #284

Merged

42 tasks

philippjfr self-assigned this Nov 23, 2015

jlstevens assigned jlstevens and unassigned philippjfr Nov 30, 2015

jlstevens mentioned this issue Dec 1, 2015

Easy definition of sanitized substitutions #312

Merged

jlstevens closed this as completed Dec 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow easy definition of sanitized substitutions #264

Allow easy definition of sanitized substitutions #264

jlstevens commented Sep 10, 2015

jlstevens commented Sep 17, 2015

philippjfr commented Sep 17, 2015

jlstevens commented Sep 17, 2015

philippjfr commented Nov 17, 2015

jbednar commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jbednar commented Nov 17, 2015

jlstevens commented Nov 17, 2015

philippjfr commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jbednar commented Nov 17, 2015

jlstevens commented Nov 30, 2015

jlstevens commented Dec 2, 2015

Allow easy definition of sanitized substitutions #264

Allow easy definition of sanitized substitutions #264

Comments

jlstevens commented Sep 10, 2015

jlstevens commented Sep 17, 2015

philippjfr commented Sep 17, 2015

jlstevens commented Sep 17, 2015

philippjfr commented Nov 17, 2015

jbednar commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jbednar commented Nov 17, 2015

jlstevens commented Nov 17, 2015

philippjfr commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jlstevens commented Nov 17, 2015

jbednar commented Nov 17, 2015

jlstevens commented Nov 30, 2015

jlstevens commented Dec 2, 2015