Real data analysis #201

vuillaut · 2019-11-04T16:38:04Z

Some changes to include real data handling in the analysis.

The approach is to keep the same functions as much as possible for MC or real data.
There are discrepancies between real data and MC right now that need some handling though.
This is done with if condition on simu metadata.
So this PR should not change anything for MC analysis (unit tests are passing so the pipeline works).

In the future, I hope we can progressively get rid of these conditions.

lstchain/reco/dl0_to_dl1.py

garciagenrique

Seems OK for me, a second check could be useful, though.
Just check the comment about the gain selection.

morcuended · 2019-11-06T17:23:44Z

lstchain/reco/dl0_to_dl1.py

+                    if is_simu:
+                        dl1_container.mc_energy = event.mc.energy.value
+                        dl1_container.log_mc_energy = np.log10(event.mc.energy.value * 1e3)
+
                    dl1_container.log_intensity = np.log10(dl1_container.intensity)
                    dl1_container.gps_time = event.trig.gps_time.value


For the pointing interpolation we need the event time. Since time information is not being written right now to the event.trig.gps_time field, from the time being I've been getting the event time as:

from astropy.time import Time start_ntp = event.lst.tel[tel_id].svc.date ntp_time = start_ntp + event.r0.tel[telescope_id].trigger_time utc_time = Time(datetime.utcfromtimestamp(ntp_time)) gps_time = utc_time.gps dl1_container.gps_time = gps_time

This should be modified back to the line you wrote when gps_time field is filled up.

Ok thanks.
I can integrate this change for real data.
Or you can PR into vuillaut:real_data so we aggregate all the changes we need to make for real data analysis and add them at once to lstchain.

Ok, I did not know I could do that. I will work on your branch then.

morcuended · 2019-11-19T10:49:28Z

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

vuillaut · 2019-11-19T10:53:47Z

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

In my opinion, no, we want to follow CTA recommendations and produce files corresponding to the official structure.
I see the interest of having DL1b of course. Maybe we can have a simple script remove_image_node for people to use on their own if they want to move data around more easily

labsaha · 2019-11-19T10:58:08Z

In yesterday's meeting @moralejo was asking about two separate files with same structure, but in one a is filled and in other one b is filled. Can it be done easily @vuillaut ?

morcuended · 2019-11-19T11:02:26Z

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

In my opinion, no, we want to follow CTA recommendations and produce files corresponding to the official structure.
I see the interest of having DL1b of course. Maybe we can have a simple script remove_image_node for people to use on their own if they want to move data around more easily

This would be easier than creating two h5 files already within the code

moralejo · 2019-11-19T15:31:45Z

In yesterday's meeting @moralejo was asking about two separate files with same structure, but in one a is filled and in other one b is filled. Can it be done easily @vuillaut ?

Yes, that is what i was asking for. It is not just that image-less files are easier to move around (that is indeed one reason). It is also that we will want to have different DL1b versions (mostly, different cleaning approaches) out of the same DL1a. This can be done with several "versions" of DL1b inside a single file, or with separate DL1b files for each cleaning approach.
In both cases these would be "standard" DL1 files in which either the images, or the parameters, are missing.
So, would it be ok to have DL1 files with only DL1a inside?

vuillaut · 2019-11-19T18:37:07Z

In yesterday's meeting @moralejo was asking about two separate files with same structure, but in one a is filled and in other one b is filled. Can it be done easily @vuillaut ?

Yes, that is what i was asking for. It is not just that image-less files are easier to move around (that is indeed one reason). It is also that we will want to have different DL1b versions (mostly, different cleaning approaches) out of the same DL1a. This can be done with several "versions" of DL1b inside a single file, or with separate DL1b files for each cleaning approach.
In both cases these would be "standard" DL1 files in which either the images, or the parameters, are missing.
So, would it be ok to have DL1 files with only DL1a inside?

Ok I see, for the second point, the current approach I have been applying is:

always produce full DL1 by default (as it is done now)
remove images when not needed
recompute parameters to create a new DL1 file from a DL1 file (in this case, we need the images, but also we don't need to rewrite all parameters, most of them can be copied, so we go from full DL1 to full DL1)

Would this approach be ok for you?
Unless there are arguments for it, I don't like too much the idea of splitting in DL1a/b by default.

I will make another PR for the dl1_to_dl1 script.

morcuended · 2019-11-19T19:27:54Z

lstchain/reco/dl0_to_dl1.py

                    writer.write(table_name=f'telescope/image/{tel_name}',
-                                 containers=[event.dl0, tel, extra_im])
+                                 containers=[event.r0, tel, extra_im])


Shouldn't we write event.dl1 container with images and pulse times here instead of event.r0 or event.dl0 as you had before? This is something I doubted about when I was writing UCM script

The images and pulse time come from tel. event.r0 includes info such as obs_id and event_id

That's true

morcuended · 2019-11-19T19:32:29Z

scripts/lstchain_data_r0_to_dl1.py

+    os.makedirs(args.outdir, exist_ok=True)
+
+    dl0_to_dl1.allowed_tels = {1, 2, 3, 4}
+    output_filename = args.outdir + '/dl1_' + os.path.basename(args.infile).rsplit('.', 1)[0] + '.h5'


This will produce filenames as: dl1_LST-1.1.Run01555.0000.fits.h5. Maybe we should take out the fits part as well to have it like: dl1_LST-1.1.Run01555.0000.h5

you are right, as right now it produces things like ***.simtel.h5.
cons: that's a double file extension
pros: it keeps the info about origin
I don't have strong preferences.

Leave it as it is then

morcuended · 2019-11-19T19:41:22Z

I tested with real data and it works fine and produces a dl1 h5 file that contains:

File(filename=dl1_data/dl1_LST-1.1.Run01566.0000.fits.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/dl1 (Group) ''
/dl1/event (Group) ''
/dl1/event/subarray (Group) ''
/dl1/event/subarray/mc_shower (Table(53000,)) 'Storage of DL0Container,MCEventContainer'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "mc_alt": Float64Col(shape=(), dflt=0.0, pos=1),
  "mc_az": Float64Col(shape=(), dflt=0.0, pos=2),
  "mc_core_x": Float64Col(shape=(), dflt=0.0, pos=3),
  "mc_core_y": Float64Col(shape=(), dflt=0.0, pos=4),
  "mc_energy": Float64Col(shape=(), dflt=0.0, pos=5),
  "mc_h_first_int": Float64Col(shape=(), dflt=0.0, pos=6),
  "mc_x_max": Float64Col(shape=(), dflt=0.0, pos=7),
  "obs_id": Int64Col(shape=(), dflt=0, pos=8)}
  byteorder := 'little'
  chunkshape := (910,)
/dl1/event/subarray/trigger (Table(53000,)) 'Storage of DL0Container,CentralTriggerContainer'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "obs_id": Int64Col(shape=(), dflt=0, pos=1)}
  byteorder := 'little'
  chunkshape := (4096,)
/dl1/event/telescope (Group) ''
/dl1/event/telescope/image (Group) ''
/dl1/event/telescope/image/LST_LSTCam (Table(51228,)) 'Storage of R0Container,DL1CameraContainer,ExtraImageInfo'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "image": Float64Col(shape=(1855,), dflt=0.0, pos=1),
  "obs_id": Int64Col(shape=(), dflt=0, pos=2),
  "pulse_time": Float64Col(shape=(1855,), dflt=0.0, pos=3),
  "tel_id": Int64Col(shape=(), dflt=0, pos=4)}
  byteorder := 'little'
  chunkshape := (8,)
/dl1/event/telescope/parameters (Group) ''
/dl1/event/telescope/parameters/LST_LSTCam (Table(51228,)) 'Storage of DL1ParametersContainer'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "intensity": Float64Col(shape=(), dflt=0.0, pos=1),
  "intercept": Float64Col(shape=(), dflt=0.0, pos=2),
  "kurtosis": Float64Col(shape=(), dflt=0.0, pos=3),
  "leakage": Float64Col(shape=(), dflt=0.0, pos=4),
  "length": Float64Col(shape=(), dflt=0.0, pos=5),
  "log_intensity": Float64Col(shape=(), dflt=0.0, pos=6),
  "mc_core_distance": Float64Col(shape=(), dflt=0.0, pos=7),
  "n_islands": Int64Col(shape=(), dflt=0, pos=8),
  "obs_id": Int64Col(shape=(), dflt=0, pos=9),
  "phi": Float64Col(shape=(), dflt=0.0, pos=10),
  "psi": Float64Col(shape=(), dflt=0.0, pos=11),
  "r": Float64Col(shape=(), dflt=0.0, pos=12),
  "skewness": Float64Col(shape=(), dflt=0.0, pos=13),
  "tel_id": Int64Col(shape=(), dflt=0, pos=14),
  "tel_pos_x": Float64Col(shape=(), dflt=0.0, pos=15),
  "tel_pos_y": Float64Col(shape=(), dflt=0.0, pos=16),
  "tel_pos_z": Float64Col(shape=(), dflt=0.0, pos=17),
  "time_gradient": Float64Col(shape=(), dflt=0.0, pos=18),
  "width": Float64Col(shape=(), dflt=0.0, pos=19),
  "wl": Float64Col(shape=(), dflt=0.0, pos=20),
  "x": Float64Col(shape=(), dflt=0.0, pos=21),
  "y": Float64Col(shape=(), dflt=0.0, pos=22)}
  byteorder := 'little'
  chunkshape := (356,)

So everything seems fine to me. One question though: Why real data h5 file still inherits the table /dl1/event/subarray/mc_shower?

vuillaut · 2019-11-19T20:05:42Z

I tested with real data and it works fine and produces a dl1 h5 file that contains:

Thanks a lot for giving it a try!

So everything seems fine to me. One question though: Why real data h5 file still inherits the table /dl1/event/subarray/mc_shower?

That's not intended indeed. Could you tell me the content of this table please?

vuillaut · 2019-11-19T20:16:47Z

I tested with real data and it works fine and produces a dl1 h5 file that contains:

Thanks a lot for giving it a try!

So everything seems fine to me. One question though: Why real data h5 file still inherits the table /dl1/event/subarray/mc_shower?

That's not intended indeed. Could you tell me the content of this table please?

@morcuended don't bother, I found it.

moralejo · 2019-11-19T23:13:44Z

always produce full DL1 by default (as it is done now)

remove images when not needed

recompute parameters to create a new DL1 file from a DL1 file (in this case, we need the images, but also we don't need to rewrite all parameters, most of them can be copied, so we go from full DL1 to full DL1)

Would this approach be ok for you?

This is ok, though I guess we will do the image removal very often, to move around DL1b-only files.

Unless there are arguments for it, I don't like too much the idea of splitting in DL1a/b by default.

I think the DL1 concept, containing both images and parameters is more designed for the final situation in which it will never contain all pixels of each camera, but just those present in DL0, or even just those surviving cleaning. In that case, the pixel info is not so much of a "ballast" when one is interested only in DL1b.

morcuended · 2019-11-20T08:26:59Z

Good! These are the tables contained in the h5 file after your modification:

['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam']

Another question: should dl1 h5 file contain other tables with metadata? For example:

[ 'instrument/subarray/layout',
'instrument/telescope/camera/LSTCam-002',
'instrument/telescope/optics']

Or is it not needed at this data level anymore?

kosack · 2019-11-20T09:37:23Z

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

it doesn't really matter- in HDF5 you can create a file that has internal symlinks to external files, so to the user it looks like a single file. I suggest that tools should expect them to be in one file (Even if in reality we use separate files, since we can use this linking trick). I'm not sure how efficient appending to an existing file is to add datasets, but I guess it should be possible to add the datasets to the existing file as well.

kosack · 2019-11-20T09:40:33Z

For LST analysis it may not matter, but note that in a full analysis once we start using reconstructions like ImPACT or Model3D, we need to have both the images (though perhaps with a loose cleaning, like starndard + 3 extra dilation rings) and the parameters as well (for starting points to the fit).

vuillaut · 2019-11-20T10:53:57Z

Good! These are the tables contained in the h5 file after your modification:
['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam']
Another question: should dl1 h5 file contain other tables with metadata? For example:

[ 'instrument/subarray/layout',
'instrument/telescope/camera/LSTCam-002',
'instrument/telescope/optics']

Or is it not needed at this data level anymore?

Hum you are right, I would prefer to have these tables in the file. I'll have a look.
@morcuended do you mind re-testing with real data please?

morcuended · 2019-11-20T12:04:45Z

Hum you are right, I would prefer to have these tables in the file. I'll have a look.
@morcuended do you mind re-testing with real data please?

Sure, working on it

morcuended · 2019-11-20T13:59:40Z

These are the tables now

In [10]: get_dataset_keys(file)                                                                                               
Out[10]: 
['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam',
 'instrument/subarray/layout',
 'instrument/subarray/layout.__table_column_meta__',
 'instrument/telescope/camera/LSTCam-002',
 'instrument/telescope/camera/LSTCam-002.__table_column_meta__',
 'instrument/telescope/optics',
 'instrument/telescope/optics.__table_column_meta__']

and they are being filled properly in the test I've done.

So green light from my side for the merging

vuillaut · 2019-11-20T14:38:21Z

These are the tables now

In [10]: get_dataset_keys(file)                                                                                               
Out[10]: 
['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam',
 'instrument/subarray/layout',
 'instrument/subarray/layout.__table_column_meta__',
 'instrument/telescope/camera/LSTCam-002',
 'instrument/telescope/camera/LSTCam-002.__table_column_meta__',
 'instrument/telescope/optics',
 'instrument/telescope/optics.__table_column_meta__']

and they are being filled properly in the test I've done.

So green light from my side for the merging

thank you so much for the tests and review @morcuended

vuillaut · 2019-11-20T14:39:58Z

So we go on and you @morcuended take over applying pointing modification discussed in #218

vuillaut added 2 commits November 4, 2019 17:35

adding condition on simu data

f963c3b

adding script r0 to dl1 for real data

36dc87d

garciagenrique reviewed Nov 4, 2019

View reviewed changes

lstchain/reco/dl0_to_dl1.py Show resolved Hide resolved

garciagenrique reviewed Nov 4, 2019

View reviewed changes

lstchain/reco/dl0_to_dl1.py Show resolved Hide resolved

garciagenrique approved these changes Nov 4, 2019

View reviewed changes

garciagenrique mentioned this pull request Nov 4, 2019

gain selector threshold in p.e. #200

Closed

vuillaut added 2 commits November 6, 2019 00:26

Merge branch 'master' into real_data

d952922

rename script real data

dcef681

vuillaut mentioned this pull request Nov 6, 2019

Pointing #195

Closed

Merge branch 'master' into real_data

1f205e7

morcuended reviewed Nov 6, 2019

View reviewed changes

vuillaut added 6 commits November 13, 2019 21:27

tel_id = 1

81d1bf3

tel_id = 1

9ce0582

some custom real data addition

6379a74

no mc_alt_tel with real data for now

f23effa

changes in cal scripts

015f6eb

fixing fill_mc for simu

081e956

rlopezcoto mentioned this pull request Nov 19, 2019

Script for processing real data r0 to dl1a+b #218

Closed

vuillaut mentioned this pull request Nov 19, 2019

Production scripts onsite #221

Closed

2 tasks

vuillaut added 3 commits November 19, 2019 17:17

merge master

86c2948

adding alt_tel and az_tel for real data

f01bee3

in case of real data, use az_tel and alt_tel

c3ce1b2

vuillaut added the ready for review Pull request is ready for review label Nov 19, 2019

morcuended reviewed Nov 19, 2019

View reviewed changes

vuillaut changed the title ~~WIP: Real data analysis~~ Real data analysis Nov 19, 2019

morcuended reviewed Nov 19, 2019

View reviewed changes

morcuended approved these changes Nov 19, 2019

View reviewed changes

write subarray tables only for simu

f82c1cb

morcuended approved these changes Nov 20, 2019

View reviewed changes

add write_array_info for real data

130e15a

morcuended approved these changes Nov 20, 2019

View reviewed changes

vuillaut merged commit 4341c3e into cta-observatory:master Nov 20, 2019

vuillaut deleted the real_data branch November 20, 2019 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real data analysis #201

Real data analysis #201

vuillaut commented Nov 4, 2019 •

edited

Loading

garciagenrique left a comment

morcuended Nov 6, 2019

vuillaut Nov 6, 2019

morcuended Nov 6, 2019

morcuended commented Nov 19, 2019

vuillaut commented Nov 19, 2019

labsaha commented Nov 19, 2019

morcuended commented Nov 19, 2019 •

edited

Loading

moralejo commented Nov 19, 2019

vuillaut commented Nov 19, 2019 •

edited

Loading

morcuended Nov 19, 2019

vuillaut Nov 19, 2019

morcuended Nov 19, 2019

morcuended Nov 19, 2019

vuillaut Nov 19, 2019

morcuended Nov 19, 2019

morcuended commented Nov 19, 2019 •

edited

Loading

vuillaut commented Nov 19, 2019

vuillaut commented Nov 19, 2019

moralejo commented Nov 19, 2019

morcuended commented Nov 20, 2019

kosack commented Nov 20, 2019

kosack commented Nov 20, 2019

vuillaut commented Nov 20, 2019 •

edited

Loading

morcuended commented Nov 20, 2019

morcuended commented Nov 20, 2019 •

edited

Loading

vuillaut commented Nov 20, 2019

vuillaut commented Nov 20, 2019

Real data analysis #201

Real data analysis #201

Conversation

vuillaut commented Nov 4, 2019 • edited Loading

garciagenrique left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morcuended commented Nov 19, 2019

vuillaut commented Nov 19, 2019

labsaha commented Nov 19, 2019

morcuended commented Nov 19, 2019 • edited Loading

moralejo commented Nov 19, 2019

vuillaut commented Nov 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morcuended commented Nov 19, 2019 • edited Loading

vuillaut commented Nov 19, 2019

vuillaut commented Nov 19, 2019

moralejo commented Nov 19, 2019

morcuended commented Nov 20, 2019

kosack commented Nov 20, 2019

kosack commented Nov 20, 2019

vuillaut commented Nov 20, 2019 • edited Loading

morcuended commented Nov 20, 2019

morcuended commented Nov 20, 2019 • edited Loading

vuillaut commented Nov 20, 2019

vuillaut commented Nov 20, 2019

vuillaut commented Nov 4, 2019 •

edited

Loading

morcuended commented Nov 19, 2019 •

edited

Loading

vuillaut commented Nov 19, 2019 •

edited

Loading

morcuended commented Nov 19, 2019 •

edited

Loading

vuillaut commented Nov 20, 2019 •

edited

Loading

morcuended commented Nov 20, 2019 •

edited

Loading