Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtraction from a cartesian product #5

Open
prabhuramachandran opened this issue May 8, 2015 · 4 comments
Open

Subtraction from a cartesian product #5

prabhuramachandran opened this issue May 8, 2015 · 4 comments

Comments

@prabhuramachandran
Copy link
Contributor

While reading your interesting paper and experimenting with lancet, I tried the following:

from lancet import Args, Range
params = Args(arg1=1.0) * Range('arg2', 1, 3, steps=3)
ap = params - Args(arg1=1.0, arg2=2.0)

But this doesn't work. I can imagine scenarios where this is just as natural as extending a parameter space with addition. Was wondering why this was not supported.

@jlstevens
Copy link
Member

Thanks for the feature request!

This functionality is something I contemplated many times and I was never entirely sure whether this behaviour should be implemented or not. The main issue is that given two parameter spaces A and B, when you try A-B there is no guarantee that the elements in B also exist in A to be subtracted. You definitely want to remove elements from set A only if there is an exact match with the elements in B.

My current recommendation is to do a constructive approach i.e build up parameter spaces with the use of + instead of using -.

That said, I can imagine cases where this doesn't work so well so I might well consider implementing the - operator that either generates a warning (or even raises an exception?) if not all the elements specified for subtraction are available for subtraction. I'm still not entirely convinced though!

Hopefully that helps clarify why this feature isn't quite as obviously good idea as it might first appear!

@jbednar
Copy link
Member

jbednar commented May 10, 2015

Subtraction does seem useful. I'm not sure why you are arguing that all of the elements need to be present in A for it to be a valid expression. Sometimes that may be true, but I don't see why it would be important to enforce. E.g. one might know that one would never want to run simulations where parameters p and q are equal, and one might build up some expression of that over some large range, and then subtract it from any A that you specify in a particular run. Forcing B to be precisely tailored to A doesn't seem important in such a case.

I'm imagining something like selecting regions in the GIMP -- one can easily specify one region, then subtract other regions from it that are not entirely contained in the first region, and that's very often much easier and more useful than only being able to delete contained regions. Seems like at worst such a case should be a helpful warning, not an error.

@jlstevens
Copy link
Member

I agree it seems useful, but I think in practice it can be quite confusing. Consider:

lancet.Range('a', 2.5, 7.8, 11) - lancet.Range('a', 3.1, 7.27, 6)

What does this specify? It is hardly obvious but if you look at the arguments, you will find it equivalent to this:

lancet.Range('a', 2.5, 7.8, 11) - lancet.Args(a=7.27)

This is because 'a=7.27' is the only value in the first range that is also specified in the second.

Fundamentally, the issue is one of set subtraction - nice and easy when the elements are drawn from a small number of discrete possibilities (e.g. a small number of distinct strings) but quite horrible when it comes to floating point numbers.

It becomes even more horrible once you realize that the behavior will then depend on the float point precision specified! The default floating point precision is 4 which means that by default this would result in no values for 'a' (the values are deemed equal and therefore the subtraction occurs):

Args(a=0.1111) - Args(a=1/9.0)

But if you do lancet.set_fp_precision(5) first, then you cannot subtract the second value of a from the first one! Do you warn, raise an exception or simply pretend the subtraction was never attempted? None of these are robust or intuitive solutions!

The fundamental issue is one of equality/inequality between dictionaries that may contain floats. The only safe way I can imagine to do subtraction would be to 1) Allow subtraction only if we know element equality is safe (i.e no floats) 2) possible introduce a new type of object that is designed to signify intervals over float arguments.

I can imagine abusing the steps argument of the Range Args to specify such intervals (not suggesting that this is a good idea or that we should do this!):

lancet.Range('a', 2.5, 7.8, 11) - lancet.Range('a', 3.1, 7.27, steps=None)

The semantics of this is "The argument 'a' is sampled from 2.5 to 7.8 in 11 steps (inclusive) but any values of 'a' anywhere between 3.1 and 7.27 (inclusive) are excluded".

I hope this illustrates the issues with the semantics of subtraction (implementing the ideas above would be quite involved!).

That said, there is one thing I can imagine which is a 'meta' Args object that would be much easier to implement but could achieve the same thing:

args = lancet.Range('a', 2.5, 7.8, 11) * Args(b='foo')
lancet.Exclude(args, 'a', (3.1, 7.27))

Here Exclude takes an Args object, the key to filter and a specification of the range (or particular values) to exclude. This class could be called Filter but then you might expect that the interval is what you keep as opposed to what you want to exclude.

Although a predicate function (e.g lambda x: 3.1 < x< 7.27) would be more general, unfortunately, such functions is a bad idea for the declarative style and reproducibility in general (issues with pickling, reprs etc). I imagine that in addition to excluding float ranges as suggested above, you could exclude sets of discrete element by specifying a list of strings, ints etc.

Thank you for reading all this and hopefully you will now see why this 'obvious' feature isn't actually so obvious after all. :-)

@jbednar
Copy link
Member

jbednar commented May 11, 2015

Those are all important considerations to raise.

However, I don't think it would be as bad as all that in practice. If you went with the interpretation of - as set subtraction, I think it is unlikely that very many users would encounter floating-point issues, because whatever float rules are in place for their Args are unlikely to often differ for the subtracted Args, which would usually be written on the same line of code or the next one. For those usages that do have floating-point issues, I would think that a simple warning listing the values that failed to match the main Args would be easy to implement and would give feedback to show people how their expressions will be interpreted.

That said, there are also good arguments for treating the subtracted Args not as set subtraction but as range exclusion, as you mention. I think people are more likely to want to exclude a complete range or interval than specific values along an interval. I'm not sure if it's necessary to implement a separate Exclude class or steps=None to achieve that; seems like Lancet could just specify that the subtraction operator always works as a range, ignoring any steps parameters. That way when someone runs some parameter search over lancet.Range('a', 2.5, 7.8, 11), and then realizes that they need a much larger range instead lancet.Range('a', 0.5, 35, 15) without re-running what they already covered, they can simply reuse their previous range as the exclusion criterion as inlancet.Range('a', 0.5, 35, 15)-lancet.Range('a', 2.5, 7.8, 11), without having to worry about making the steps within the range match up precisely. Ignoring the steps isn't required, but it seems like respecting them is more likely to be troublesome than ignoring them is.

Hope that makes sense! Overall, I do think it would be useful to add some sort of subtraction support, with appropriate warnings or documentation about how to use it effectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants