Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pair correlations #524

Merged
merged 19 commits into from
Oct 28, 2020
Merged

Conversation

SvenPVoigt
Copy link
Contributor

@SvenPVoigt SvenPVoigt commented Sep 29, 2020

Pair Correlations can be calculated in many ways. In this pull request, I am calculating pair correlations from 2point statistics- in the new function paircorr_from_twopoint. One caveat of using two point statistics is that at large r values, probability values become very small and closely spaced. Therefore, the function provides a cutoff_r and an interpolate_n parameter for dealing with probabilities at large r values.

Additionally, I provide a dist_from_center function, which could be used in other functions as well because it is so generic.

I put both functions in pair_correlations.py, but I am open to adding dist_from_center in func.py and paircorr_from_twopoint in correlations.py. If I change the location of the functions, is there a git tool (maybe the squash and merge option) for cleaning up commits so that only the changes to final (not intermediate) files are represented?

Also, if there is any area for increasing the usage of dask, please let me know. I will try to identify some additional spots as well. Right now, I am simply validating that dask and numpy arrays can be passed as input to my functions.

…n function for one sample, which will be used to validate multisample function
…number of samples. Tests notebook validates the accuracy, function options, and extension to 2+ dimensions.
@wd15
Copy link
Contributor

wd15 commented Sep 29, 2020

@SvenPVoigt nice job. I'll hopefully review this in the next few days.

@wd15 wd15 added this to the 0.4.1 milestone Oct 2, 2020
@wd15
Copy link
Contributor

wd15 commented Oct 7, 2020

I'm a little confused. I think that paircorr_from_twopoint should return a tuple with the probabilities in the first item and the radii in the second item. I don't think it's good to store data with different units in the same array. Also, the probabilities should be shaped (n_samples, n_interpolate) to fit in with how the first axis is the sample axis as in the rest of PyMKS. What do you think?

@wd15
Copy link
Contributor

wd15 commented Oct 8, 2020

New dist_from_center code?

center = lambda x: np.reshape(
    np.array(x) // 2,
    (len(x),) + (1,) * len(x)
)
    
dist_mesh = sequence(
    fmap(lambda x: np.linspace(0, x - 1, x)),
    lambda x: np.array(np.meshgrid(*x, indexing='ij')),
    lambda x: x - center(x.shape[1:]),
    lambda x: np.linalg.norm(x, axis=0)
)

Make paircorr_from_twopoint more functional. Split steps into
sub-functions to make it easier to determine the
steps. paircorr_from_twopoint is added to the API docs. The test
notebook has been removed as the only significant tests where already
doc strings. The function works with Dask correctly and is tested with
Dask.
@wd15
Copy link
Contributor

wd15 commented Oct 13, 2020

@SvenPVoigt I've made a PR into your branch. You probably should merge upstream's master and rebase and then I'll redo the PR. Anyway I've made things more functional. Also,

  • added paircorr_from_twopoint to the API docs
  • make make_da keep its docstring
  • Made the docstring better

You might want to check that you're happy with the API docs.

SvenPVoigt and others added 7 commits October 19, 2020 15:32
Make pair correlations function more functional
Use less verbose output so that Travis CI failures are easier to
parse.
The rendering for table views of Dask arrays changes from version to
version so can't be included in tests.
Ignore the rendering of Dask arrays in the notebooks when testing. The
Dask array rendering is subject to HTML style changes so that tests
will break whenever the Dask array changes how the table is
rendered. To test for dask arrays, explicitly test the values, chunks
or shape.
@wd15
Copy link
Contributor

wd15 commented Oct 20, 2020

@auag92, @beyucel : can you both review this now. It's ready. Also, @SvenPVoigt can you also give your seal of approval in the comments. Thanks

@wd15
Copy link
Contributor

wd15 commented Oct 20, 2020

@auag92 @beyucel : I added @SvenPVoigt new function to the API so please check that it reads well, https://pymks--524.org.readthedocs.build/en/524/API.html#pymks.paircorr_from_twopoint

@wd15 wd15 merged commit 86b6016 into materialsinnovation:master Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants