Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to return indexes in conditional abundance matching #913

Merged
merged 12 commits into from
Jun 8, 2018

Conversation

Christopher-Bradshaw
Copy link
Contributor

@Christopher-Bradshaw Christopher-Bradshaw commented Jun 7, 2018

As discussed in person + slack the primary goal of this is to:

  • Add an option to return the index of y2 that was matched (rather than the value in y2) to the CAM code

I made one other minor change - previously, subgrid noise was not being added to the left and right edges. Now it should be.

I've put a number of comments on the PR where I have done something I am not sure about. Let me know what you think!

from .tests.naive_python_cam import sample2_window_indices


def conditional_abunmatch(x, y, x2, y2, nwin, add_subgrid_noise=True,
assume_x_is_sorted=False, assume_x2_is_sorted=False):
assume_x_is_sorted=False, assume_x2_is_sorted=False, return_indexes=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best api is - another option would be to have two functions:
conditional_abunmatch and conditional_abunmatch_indexes. I don't know how this is done elsewhere in halotools/how you would like to do it going forward. Your call :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time I prefer creating an entirely new function is in cases where adding in some new feature considerably slows performance; if performance is mission-critical to the original function, I just write a new one. The additional algebraic operations added here are minor enough that I think what you've done here is preferable.

@@ -159,7 +183,9 @@ def cython_bin_free_cam_kernel(double[:] y1, double[:] y2, int[:] i2_match, int
cdef int idx_in1, idx_out1, idx_in2, idx_out2
cdef double value_in1, value_out1, value_in2, value_out2

cdef double[:] y1_new = np.zeros(npts1, dtype='f8') - 1
cdef int[:] y1_new_indexes = np.zeros(npts1, dtype='i4') - 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to define these in if statements:

if return_indexes:
    cdef int[:] y1_new_indexes = np.zeros(npts1, dtype='i4') - 1
else:
    cdef double[:] y1_new_values = np.zeros(npts1, dtype='f8') - 1

but that wasn't allowed?

Defining both is a bit unnecessary, but I couldn't find a nice way around it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't know a way around this either. This seems fine.


if iy2 > iy2_max:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that this shouldn't be possible and this set of if if else block confused me a bit at first. I think this reads a bit easier now but I'm happy to revert if you disagree.

Copy link
Contributor

@aphearin aphearin Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is a little easier to read, but min returns a python object, so it's better practice to use a c-imported min function instead:

from libc.math cimport fmin as c_min

After looking at the cython -a output, in this case I'm pretty sure the cython compiler figures it out anyway and there is no performance hit, so just pointing this out so you are aware of python-cython interactions.

Also watch out for this on lines 130 & 131 inside the definition of get_value_at_rank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as these are ints I don't think there is a min/max func in libc.math for them. (checked https://github.com/cython/cython/blob/master/Cython/Includes/libc/math.pxd)

http://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html suggests that cython has a fast min/max for ints (as you saw in the cython -a output).

high_rank = rank1 + 1
if low_rank < 0:
low_rank = 0
elif high_rank >= nwin:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be if rather than elif? That is what I have in the function (or the same with min/max)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, no problem at all to leave as is - mostly I was pointing this out for pedagogical reasons: since this is one of your first cython function contributions I thought it might be helpful to be explicitly aware of all python-layer interactions. But cython -a does reveal that the cython compiler has this very well cleaned.

@@ -384,11 +386,13 @@ def test_subgrid_noise1():
result2 = conditional_abunmatch(x, y, x2, y2, nwin1, add_subgrid_noise=True)
assert np.allclose(result, result2, atol=0.1)
assert not np.allclose(result, result2, atol=0.02)
assert np.all(result - result2 != 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be enough to assert that we are adding noise to the leftmost/rightmost edges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually ensure that noise was added to the edges? Or just that noise was added somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it ensures that noise was added to every point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, upon second look I see that now.

x = np.random.uniform(0, 10, n1)
y = np.random.uniform(0, 1, n1)

with NumpyRNGContext(fixed_seed2):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I used the same seed here as above tests were passing when they shouldn't (before I had correctly reordered/reindexed the indexes). I think that was because we were in the special case of x == x2 and y == y2 + c.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yes, good catch, sometimes I have to use different seeds for such purposes as well.

@cython.boundscheck(False)
@cython.nonecheck(False)
@cython.wraparound(False)
def get_value_at_rank(double[:] sorted_values, int rank1, int nwin, int add_subgrid_noise):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, quick implementation comment about this get_value_at_rank function. This function is declared with def rather than cdef, which means get_value_at_rank returns a python object. If you run cython -a bin_free_cam_kernel.pyx and inspect the html output, you will see deep yellow on line 270. Click on it and see all the code cython generates to compile this line. To fix this, get_value_at_rank needs to be C-declared. See, e.g., the definition of _bisect_left_kernel. With this change, this means that get_value_at_rank will no longer be accessible by the outside world, but only accessible within cython, so you will not be permitted to add it to __all__.

In general, it's good to get in the habit of always running cython -a whenever reviewing cython code. Because cython is a superset of python, the compiler will allow you to write code with avoidable python-layer interactions without complaining, so you just have to be vigilant and always check this manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defed it so that it would be accessible to the code that does the left and right edges.

But maybe a better way would be to have an internal cdef int _get_value_at_rank and then an external def get_value_at_rank that just wraps this?

Running cython -a is definitely a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin_free_cam_kernel.html looks clean as of this commit. I also just did a simple timing test, comparing your branch to the current version of master, and it looks like there is a nearly negligible performance increase from this feature, so I think this PR is all set once it passes CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, thanks for the comments!

@cython.boundscheck(False)
@cython.nonecheck(False)
@cython.wraparound(False)
cdef int _find_index(int[:] arr, int val):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _find_index function implements a blind lookup for a value in the correspondence indices array. The cython is clean (though watch out for the undeclared index variable on line 265), but I think the algorithm can be improved by taking advantage of the binary search that is already implemented, no? While the correspondence_indx2 array is not sorted, the sorted_cdf_values2 maintains a sorted order. If an original array of indices were defined at the very beginning of the engine, simply with np.arange(npts2), couldn't that array be manipulated alongside sorted_cdf_values2 to avoid the blind lookup? Naively, it seems this would be faster; it could be that the performance hit from the additional manipulations wash out the improvement, I don't know, but it seems worth testing since the cythonized binary search machinery is already in place. Let me know whether this makes sense.

Copy link
Contributor Author

@Christopher-Bradshaw Christopher-Bradshaw Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand what you are suggesting:
In the same way that we have the correspondence_indx2 array to keep track of how to unsort the sorted_cdf_values2, we can have a another array that keeps track of where things are in the correspondence_indx2? That would make this _find_index just a lookup in that map.

I think that this has roughly the same time complexity - when we update the correspondence indicies (to shift them along) we will need to update this array too and that would be roughly O(len(nwin)) which is I think what we have now.

But I could well have missed something... I definitely don't fully understand all the internals.

(index cdefed in f05be1c)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be right, this might not be worth the trouble. After cleaning the cython of the remaining python interactions as discussed elsewhere, could you just do a simple timing test comparing your implementation against what is currently in master? If runtimes are comparable, then it means that further optimization is not worth spending time on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the timing test, just make sure to use at least ~1e6 points

Copy link
Contributor Author

@Christopher-Bradshaw Christopher-Bradshaw Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new code

Not returning indexes 143.03s
Returning indexes 171.74s

In master

Not returning indexes 150.37s
... TypeError: conditional_abunmatch() got an unexpected keyword argument 'return_indexes'                  

So returning indexes is ~20% slower, probably due to that extra linear search over the window. I used 1e7 points in both x and y and a 10001 sized window.

But no change in the returning values case.

@aphearin aphearin merged commit 0b9d31c into astropy:master Jun 8, 2018
@Christopher-Bradshaw Christopher-Bradshaw deleted the idx-cam-kernel branch June 12, 2018 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants