[bump minor] Pk1d covariance new #1067

corentinravoux · 2024-07-04T15:06:00Z

Two improvements:

Implementation of a new estimate of the covariance matrix, detailed here: https://www.overleaf.com/7592125432tksctmwfpngw#c4fed4
Covariance matrix calculation computed in a vectorized way, and with memory optimization. Now working in ~ 10 NERSC node minutes for a Y1 sample.

calculation

corentinravoux · 2024-07-04T15:26:03Z

Comparison of covariance matrix calculation on a small example (left before, right after), there are not identical since the definition has changed, but their shape are similar.

Waelthus · 2024-07-05T11:36:57Z

Ok, this sounds like great improvement, I guess 10 NERSC mins should be much better than the previous version and at least similar structures are obtained. How small is small? E.g. is a smoothing step necessary to make any sense out of the right/bottom 1/3 of the covar?
Also do we have an idea where the subtle differences come from? Is this just numerical from e.g. different order of operations or is the difference in method actually analytically expected to produce slightly different results? E.g. at the strong correlation spikes around pixel [0,8] (is that correlation due to the SiII/III correlations? Or something to do with DLAs/ cont fitting? Looks like it's the longest range correlations we have in here, but somewhat weaker in the new approach...)
I guess we cannot do inter-redshift-bin correlations just yet, but that should be ok...

Could you run this on a significantly large set of data? I'll go through the code changes now...

corentinravoux · 2024-07-05T12:40:19Z

This is a very small example, like 2-3% of Y1, just for the test.
The main difference between the estimators is that now we are treated the weights properly in the normalization of all the covariance matrix, so it can the profile according the average shape of weights over redshift range. I think it is hard to interpret.
For "the strong correlation spikes around pixel [0,8]" I think this is due to the lack of modes that increases the stat error bar at very large scales. This can also be impacted by what you mention.

In Y1 run, associated to systematics (I will show it at the upcoming DESI meeting):

Waelthus

This looks fine overall, needs somewhat more commenting, and it would be good to have some of the selections defined in one place for flexibility/easier maintanence

py/picca/pk1d/postproc_pk1d.py

Waelthus · 2024-07-05T12:07:25Z

py/picca/pk1d/postproc_pk1d.py

+    return covariance_array
+
+
+def compute_cov_not_vectorized(


this is not only not vectorized, but the actual complete old method, i.e. not only slower, but also doing slightly different things potentially by actually summing up each individual mode every time. Maybe clarify that in the docstring

That is what we are doing on the new method too, no ?
I have added a comment on this function

yes, but in the old one the approach is split in a completely different way by looping over every mode while in the new we're precomputing intermediate summaries that are then reused multiple times, and in addition do everything in a vectorized way...

Comment added

py/picca/pk1d/postproc_pk1d.py

Waelthus · 2024-07-05T12:15:03Z

py/picca/pk1d/postproc_pk1d.py

+    mean_pk_product = np.outer(mean_pk, mean_pk)
+
+    sum_p1d_weights = np.nansum(p1d_weights, axis=0)
+    weights_sum_product = np.outer(sum_p1d_weights, sum_p1d_weights)


this is very confusing naming given that below you have weights_product_sum

Yes but we have indeed a sum of product and a product of sums. I renamed them with the of, it should be a little bit clearer

maybe adding what is summed into the name could help, but not sure...

py/picca/pk1d/postproc_pk1d.py

Waelthus · 2024-07-05T12:44:17Z

py/picca/pk1d/postproc_pk1d.py

+    else:
+        p1d_sub_table["weight"] = 1
+
+    p1d_sub_table = p1d_sub_table[p1d_sub_table["k_index"] >= 0]


are we storing negative k_index somewhere? Why is this cut needed?

This was a convention for the case when a k bin computed for the p1d does not fall inside the output k binning. I removed those bins as they do not contribute to the covariance.

This cut is needed because I made a method to compute the p1d groups in a vectorized way, and -1 change the last k bin, but it should not change any bin

I still don't fully get how this line is changing anything. Wouldn't the selector be always true? Or are there nans or -1s in that table for bins that just aren't filled and that you want to remove here (in which case I would test for not having exactly that filler value and not via >=0)?

With the implementation right now, this is just to remove p1d pixels which were not associated with any wavenumber bin.
I made a method previously for which this cut was necessary, but now it is just to reduce a little the input dataset, removing unecessary pixels.
Comment added.

Waelthus · 2024-07-05T12:48:14Z

py/picca/pk1d/postproc_pk1d.py

+    """
+
+    select_z = (p1d_table["forest_z"] < zbin_edges[izbin + 1]) & (
+        p1d_table["forest_z"] > zbin_edges[izbin]


this is the p1d_table of each chunk and not the mean_p1d_table of the combined stats, so my comments regarding indexing shouldn't apply here...

Waelthus · 2024-07-05T12:53:58Z

py/picca/pk1d/postproc_pk1d.py

+        nbins_k,
+    )
+    if number_worker == 1:
+        output_cov = [func(*p1d_los) for p1d_los in p1d_los_table]


not sure if this should be *p1d_los or just p1d_los, I'd have guessed the latter, was this tested on a single core as well?

this is *p1d_los because p1d_los is a table with 5 array, that we give to func. Yes I tested it

but why isn't that needed below in the mapped version, shouldn't that be a starmap then or similar? But I guess if it works it works...

Waelthus · 2024-07-05T12:56:06Z

py/picca/pk1d/postproc_pk1d.py

+            i_max = (izbin + 1) * nbins_k * nbins_k
+            cov_table["covariance"][i_min:i_max] = covariance_array
+
+    if compute_bootstrap:


is the bootstrap method tested? If yes, would be nice to know it's results, if no please at least add comments that this isn't tested yet...

The bootstrap method works and gives very similar results than the classical covariance, which makes sense considering the number of chunks.
All the Y1 plot I am showing is with bootstrap

Waelthus · 2024-07-05T13:09:08Z

In Y1 run, associated to systematics (I will show it at the upcoming DESI meeting)

Ouch, this looks much more correlated than I expected... Can you chainge the colormap so that it actually goes from -1 to 1 and maybe also that it's white/yellow for 0 correlation and diverging outside? I looked at it and at first glance thought green is zero, but that isn't actually right and it looks like except when comparing the largest and smallest modes available things are >50% correlated everywhere

corentinravoux · 2024-07-05T14:30:22Z

Be carefull on interpreting that, because the systematic covariance matrix is computed in a very simplified way, thus boosting the correlations.
Bootstrap stat only covariance:

stat + syst:

As you can see with the simple syst cov model, it is exagerated

Waelthus · 2024-07-10T06:51:22Z

maybe we should also add a line for covar computation here:

picca/py/picca/tests/test_2_pk1d.py

Lines 88 to 97 in 02f6c2e

    
           shutil.copy(self._masterFiles + "/test_Pk1D/Pk1D.fits.gz", 
        
                       self._branchFiles + "/Products/meanPk1D") 
        
           print(os.listdir(self._branchFiles + "/Products/meanPk1D")) 
        
           cmd = "picca_Pk1D_postprocess.py " 
        
           cmd += " --in-dir " + self._branchFiles + "/Products/meanPk1D" 
        
           cmd += " --output-file " + self._branchFiles + "/Products/meanPk1D/meanPk1D.fits.gz" 
        
           #- small sample => k,z-bins changed wrt default ones 
        
           cmd += " --zedge-min 2.1 --zedge-max 3.1 --zedge-bin 0.2" 
        
           cmd += " --kedge-min 0.015 --kedge-max 0.035 --kedge-bin 0.005" 
        
           picca_Pk1D_postprocess.main(cmd.split()[1:])

and compare the output to a saved state. That would at least make sure this code is run regularly (e.g. useful for library version changes) and we don't change stuff by accident, even if the covar of so few spectra will be garbage...

Waelthus

Given that this PR seems to do what it's supposed to, I'd be fine with merging. Please consider adding comments to the code where there are still semi-open questions (so that we e.g. don't wonder why we did certain steps in the future and remove those). Also please consider adding a line to the test in your favourite config with the relevant output (or send me the line you'd like to test and I'd add it).

corentinravoux · 2024-07-15T13:39:18Z

Comments and tests added, waiting for pytest, if successful, let's merge it

Waelthus · 2024-07-15T15:46:09Z

if one would like to use the bootstrap in testing, maybe allow setting a seed in the command line call via an additional argument and initialize an np.default_rng with that seed if given or without if not, then also the bootstrap could run deterministically.
But not fully needed here, and glad there is a test at all

corentinravoux · 2024-07-15T15:56:58Z

Here since the bootstrap is just a run of the covariance matrix, it is not essential to test it.
But yes the seed for the bootstrap can be added later.
I think we can merge now

corentinravoux added 4 commits July 3, 2024 05:48

new covariance implementation vectorized

d499e13

new version of covariance computation, optimized for fast and low memory

b364ef3

calculation

improvement doc

db1ce68

pylint fix

d71221e

test fix

f2120b8

corentinravoux self-assigned this Jul 5, 2024

corentinravoux requested a review from Waelthus July 5, 2024 08:18

fixing test

47248bb

corentinravoux linked an issue Jul 5, 2024 that may be closed by this pull request

Covariance measurement for Pk1d needs improvements #990

Closed

Waelthus requested changes Jul 5, 2024

View reviewed changes

implementing Michael comments

5a69c6c

pylint fix

23cd763

Waelthus approved these changes Jul 15, 2024

View reviewed changes

corentinravoux added 2 commits July 15, 2024 06:01

adding comments

060f9df

adding covariance tests

eb28700

removing bootstrap from test, as it uses random

a6b5658

Waelthus added this pull request to the merge queue Jul 15, 2024

Waelthus removed this pull request from the merge queue due to a manual request Jul 15, 2024

Waelthus changed the title ~~Pk1d covariance new~~ [bump minor] Pk1d covariance new Jul 15, 2024

Waelthus added this pull request to the merge queue Jul 15, 2024

Merged via the queue into master with commit c2b4fad Jul 15, 2024
10 checks passed

Waelthus deleted the pk1d_covariance_new branch July 15, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bump minor] Pk1d covariance new #1067

[bump minor] Pk1d covariance new #1067

corentinravoux commented Jul 4, 2024

corentinravoux commented Jul 4, 2024

Waelthus commented Jul 5, 2024

corentinravoux commented Jul 5, 2024 •

edited

Loading

Waelthus left a comment

Waelthus Jul 5, 2024

corentinravoux Jul 5, 2024

Waelthus Jul 5, 2024

corentinravoux Jul 15, 2024

Waelthus Jul 5, 2024

corentinravoux Jul 5, 2024

Waelthus Jul 5, 2024

Waelthus Jul 5, 2024

corentinravoux Jul 5, 2024

corentinravoux Jul 5, 2024

Waelthus Jul 5, 2024

corentinravoux Jul 15, 2024

Waelthus Jul 5, 2024

Waelthus Jul 5, 2024

corentinravoux Jul 5, 2024

Waelthus Jul 5, 2024

Waelthus Jul 5, 2024

corentinravoux Jul 5, 2024

Waelthus commented Jul 5, 2024

corentinravoux commented Jul 5, 2024

Waelthus commented Jul 10, 2024

Waelthus left a comment

corentinravoux commented Jul 15, 2024

Waelthus commented Jul 15, 2024

corentinravoux commented Jul 15, 2024

[bump minor] Pk1d covariance new #1067

[bump minor] Pk1d covariance new #1067

Conversation

corentinravoux commented Jul 4, 2024

corentinravoux commented Jul 4, 2024

Waelthus commented Jul 5, 2024

corentinravoux commented Jul 5, 2024 • edited Loading

Waelthus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Waelthus commented Jul 5, 2024

corentinravoux commented Jul 5, 2024

Waelthus commented Jul 10, 2024

Waelthus left a comment

Choose a reason for hiding this comment

corentinravoux commented Jul 15, 2024

Waelthus commented Jul 15, 2024

corentinravoux commented Jul 15, 2024

corentinravoux commented Jul 5, 2024 •

edited

Loading