Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real data analysis #201

Merged
merged 16 commits into from
Nov 20, 2019
Merged

Real data analysis #201

merged 16 commits into from
Nov 20, 2019

Conversation

vuillaut
Copy link
Member

@vuillaut vuillaut commented Nov 4, 2019

Some changes to include real data handling in the analysis.

The approach is to keep the same functions as much as possible for MC or real data.
There are discrepancies between real data and MC right now that need some handling though.
This is done with if condition on simu metadata.
So this PR should not change anything for MC analysis (unit tests are passing so the pipeline works).

In the future, I hope we can progressively get rid of these conditions.

Copy link
Collaborator

@garciagenrique garciagenrique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK for me, a second check could be useful, though.
Just check the comment about the gain selection.

@vuillaut vuillaut mentioned this pull request Nov 6, 2019
if is_simu:
dl1_container.mc_energy = event.mc.energy.value
dl1_container.log_mc_energy = np.log10(event.mc.energy.value * 1e3)

dl1_container.log_intensity = np.log10(dl1_container.intensity)
dl1_container.gps_time = event.trig.gps_time.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the pointing interpolation we need the event time. Since time information is not being written right now to the event.trig.gps_time field, from the time being I've been getting the event time as:

from astropy.time import Time 
start_ntp = event.lst.tel[tel_id].svc.date
ntp_time = start_ntp + event.r0.tel[telescope_id].trigger_time
utc_time = Time(datetime.utcfromtimestamp(ntp_time))
gps_time = utc_time.gps
dl1_container.gps_time = gps_time

This should be modified back to the line you wrote when gps_time field is filled up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks.
I can integrate this change for real data.
Or you can PR into vuillaut:real_data so we aggregate all the changes we need to make for real data analysis and add them at once to lstchain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did not know I could do that. I will work on your branch then.

@morcuended
Copy link
Member

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

@vuillaut
Copy link
Member Author

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

In my opinion, no, we want to follow CTA recommendations and produce files corresponding to the official structure.
I see the interest of having DL1b of course. Maybe we can have a simple script remove_image_node for people to use on their own if they want to move data around more easily

@labsaha
Copy link
Collaborator

labsaha commented Nov 19, 2019

In yesterday's meeting @moralejo was asking about two separate files with same structure, but in one a is filled and in other one b is filled. Can it be done easily @vuillaut ?

@morcuended
Copy link
Member

morcuended commented Nov 19, 2019

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

In my opinion, no, we want to follow CTA recommendations and produce files corresponding to the official structure.
I see the interest of having DL1b of course. Maybe we can have a simple script remove_image_node for people to use on their own if they want to move data around more easily

This would be easier than creating two h5 files already within the code

@moralejo
Copy link
Collaborator

In yesterday's meeting @moralejo was asking about two separate files with same structure, but in one a is filled and in other one b is filled. Can it be done easily @vuillaut ?

Yes, that is what i was asking for. It is not just that image-less files are easier to move around (that is indeed one reason). It is also that we will want to have different DL1b versions (mostly, different cleaning approaches) out of the same DL1a. This can be done with several "versions" of DL1b inside a single file, or with separate DL1b files for each cleaning approach.
In both cases these would be "standard" DL1 files in which either the images, or the parameters, are missing.
So, would it be ok to have DL1 files with only DL1a inside?

@vuillaut vuillaut mentioned this pull request Nov 19, 2019
2 tasks
@vuillaut vuillaut added the ready for review Pull request is ready for review label Nov 19, 2019
@vuillaut
Copy link
Member Author

vuillaut commented Nov 19, 2019

In yesterday's meeting @moralejo was asking about two separate files with same structure, but in one a is filled and in other one b is filled. Can it be done easily @vuillaut ?

Yes, that is what i was asking for. It is not just that image-less files are easier to move around (that is indeed one reason). It is also that we will want to have different DL1b versions (mostly, different cleaning approaches) out of the same DL1a. This can be done with several "versions" of DL1b inside a single file, or with separate DL1b files for each cleaning approach.
In both cases these would be "standard" DL1 files in which either the images, or the parameters, are missing.
So, would it be ok to have DL1 files with only DL1a inside?

Ok I see, for the second point, the current approach I have been applying is:

  • always produce full DL1 by default (as it is done now)
  • remove images when not needed
  • recompute parameters to create a new DL1 file from a DL1 file (in this case, we need the images, but also we don't need to rewrite all parameters, most of them can be copied, so we go from full DL1 to full DL1)

Would this approach be ok for you?
Unless there are arguments for it, I don't like too much the idea of splitting in DL1a/b by default.

I will make another PR for the dl1_to_dl1 script.

writer.write(table_name=f'telescope/image/{tel_name}',
containers=[event.dl0, tel, extra_im])
containers=[event.r0, tel, extra_im])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we write event.dl1 container with images and pulse times here instead of event.r0 or event.dl0 as you had before? This is something I doubted about when I was writing UCM script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images and pulse time come from tel. event.r0 includes info such as obs_id and event_id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true

@vuillaut vuillaut changed the title WIP: Real data analysis Real data analysis Nov 19, 2019
os.makedirs(args.outdir, exist_ok=True)

dl0_to_dl1.allowed_tels = {1, 2, 3, 4}
output_filename = args.outdir + '/dl1_' + os.path.basename(args.infile).rsplit('.', 1)[0] + '.h5'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce filenames as: dl1_LST-1.1.Run01555.0000.fits.h5. Maybe we should take out the fits part as well to have it like: dl1_LST-1.1.Run01555.0000.h5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, as right now it produces things like ***.simtel.h5.
cons: that's a double file extension
pros: it keeps the info about origin
I don't have strong preferences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it as it is then

@morcuended
Copy link
Member

morcuended commented Nov 19, 2019

I tested with real data and it works fine and produces a dl1 h5 file that contains:

File(filename=dl1_data/dl1_LST-1.1.Run01566.0000.fits.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/dl1 (Group) ''
/dl1/event (Group) ''
/dl1/event/subarray (Group) ''
/dl1/event/subarray/mc_shower (Table(53000,)) 'Storage of DL0Container,MCEventContainer'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "mc_alt": Float64Col(shape=(), dflt=0.0, pos=1),
  "mc_az": Float64Col(shape=(), dflt=0.0, pos=2),
  "mc_core_x": Float64Col(shape=(), dflt=0.0, pos=3),
  "mc_core_y": Float64Col(shape=(), dflt=0.0, pos=4),
  "mc_energy": Float64Col(shape=(), dflt=0.0, pos=5),
  "mc_h_first_int": Float64Col(shape=(), dflt=0.0, pos=6),
  "mc_x_max": Float64Col(shape=(), dflt=0.0, pos=7),
  "obs_id": Int64Col(shape=(), dflt=0, pos=8)}
  byteorder := 'little'
  chunkshape := (910,)
/dl1/event/subarray/trigger (Table(53000,)) 'Storage of DL0Container,CentralTriggerContainer'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "obs_id": Int64Col(shape=(), dflt=0, pos=1)}
  byteorder := 'little'
  chunkshape := (4096,)
/dl1/event/telescope (Group) ''
/dl1/event/telescope/image (Group) ''
/dl1/event/telescope/image/LST_LSTCam (Table(51228,)) 'Storage of R0Container,DL1CameraContainer,ExtraImageInfo'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "image": Float64Col(shape=(1855,), dflt=0.0, pos=1),
  "obs_id": Int64Col(shape=(), dflt=0, pos=2),
  "pulse_time": Float64Col(shape=(1855,), dflt=0.0, pos=3),
  "tel_id": Int64Col(shape=(), dflt=0, pos=4)}
  byteorder := 'little'
  chunkshape := (8,)
/dl1/event/telescope/parameters (Group) ''
/dl1/event/telescope/parameters/LST_LSTCam (Table(51228,)) 'Storage of DL1ParametersContainer'
  description := {
  "event_id": Int64Col(shape=(), dflt=0, pos=0),
  "intensity": Float64Col(shape=(), dflt=0.0, pos=1),
  "intercept": Float64Col(shape=(), dflt=0.0, pos=2),
  "kurtosis": Float64Col(shape=(), dflt=0.0, pos=3),
  "leakage": Float64Col(shape=(), dflt=0.0, pos=4),
  "length": Float64Col(shape=(), dflt=0.0, pos=5),
  "log_intensity": Float64Col(shape=(), dflt=0.0, pos=6),
  "mc_core_distance": Float64Col(shape=(), dflt=0.0, pos=7),
  "n_islands": Int64Col(shape=(), dflt=0, pos=8),
  "obs_id": Int64Col(shape=(), dflt=0, pos=9),
  "phi": Float64Col(shape=(), dflt=0.0, pos=10),
  "psi": Float64Col(shape=(), dflt=0.0, pos=11),
  "r": Float64Col(shape=(), dflt=0.0, pos=12),
  "skewness": Float64Col(shape=(), dflt=0.0, pos=13),
  "tel_id": Int64Col(shape=(), dflt=0, pos=14),
  "tel_pos_x": Float64Col(shape=(), dflt=0.0, pos=15),
  "tel_pos_y": Float64Col(shape=(), dflt=0.0, pos=16),
  "tel_pos_z": Float64Col(shape=(), dflt=0.0, pos=17),
  "time_gradient": Float64Col(shape=(), dflt=0.0, pos=18),
  "width": Float64Col(shape=(), dflt=0.0, pos=19),
  "wl": Float64Col(shape=(), dflt=0.0, pos=20),
  "x": Float64Col(shape=(), dflt=0.0, pos=21),
  "y": Float64Col(shape=(), dflt=0.0, pos=22)}
  byteorder := 'little'
  chunkshape := (356,)

So everything seems fine to me. One question though: Why real data h5 file still inherits the table /dl1/event/subarray/mc_shower?

@vuillaut
Copy link
Member Author

I tested with real data and it works fine and produces a dl1 h5 file that contains:

Thanks a lot for giving it a try!

So everything seems fine to me. One question though: Why real data h5 file still inherits the table /dl1/event/subarray/mc_shower?

That's not intended indeed. Could you tell me the content of this table please?

@vuillaut
Copy link
Member Author

I tested with real data and it works fine and produces a dl1 h5 file that contains:

Thanks a lot for giving it a try!

So everything seems fine to me. One question though: Why real data h5 file still inherits the table /dl1/event/subarray/mc_shower?

That's not intended indeed. Could you tell me the content of this table please?

@morcuended don't bother, I found it.

@moralejo
Copy link
Collaborator

  • always produce full DL1 by default (as it is done now)
  • remove images when not needed
  • recompute parameters to create a new DL1 file from a DL1 file (in this case, we need the images, but also we don't need to rewrite all parameters, most of them can be copied, so we go from full DL1 to full DL1)

Would this approach be ok for you?

This is ok, though I guess we will do the image removal very often, to move around DL1b-only files.

Unless there are arguments for it, I don't like too much the idea of splitting in DL1a/b by default.

I think the DL1 concept, containing both images and parameters is more designed for the final situation in which it will never contain all pixels of each camera, but just those present in DL0, or even just those surviving cleaning. In that case, the pixel info is not so much of a "ballast" when one is interested only in DL1b.

@morcuended
Copy link
Member

Good! These are the tables contained in the h5 file after your modification:

['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam']

Another question: should dl1 h5 file contain other tables with metadata? For example:

[ 'instrument/subarray/layout',
'instrument/telescope/camera/LSTCam-002',
'instrument/telescope/optics']

Or is it not needed at this data level anymore?

@kosack
Copy link

kosack commented Nov 20, 2019

In the end, do we want to have two separated output files corresponding to dl1a and dl1b data? @vuillaut @rlopezcoto @moralejo

it doesn't really matter- in HDF5 you can create a file that has internal symlinks to external files, so to the user it looks like a single file. I suggest that tools should expect them to be in one file (Even if in reality we use separate files, since we can use this linking trick). I'm not sure how efficient appending to an existing file is to add datasets, but I guess it should be possible to add the datasets to the existing file as well.

@kosack
Copy link

kosack commented Nov 20, 2019

For LST analysis it may not matter, but note that in a full analysis once we start using reconstructions like ImPACT or Model3D, we need to have both the images (though perhaps with a loose cleaning, like starndard + 3 extra dilation rings) and the parameters as well (for starting points to the fit).

@vuillaut
Copy link
Member Author

vuillaut commented Nov 20, 2019

Good! These are the tables contained in the h5 file after your modification:

['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam']

Another question: should dl1 h5 file contain other tables with metadata? For example:

[ 'instrument/subarray/layout',
'instrument/telescope/camera/LSTCam-002',
'instrument/telescope/optics']

Or is it not needed at this data level anymore?

Hum you are right, I would prefer to have these tables in the file. I'll have a look.
@morcuended do you mind re-testing with real data please?

@morcuended
Copy link
Member

Hum you are right, I would prefer to have these tables in the file. I'll have a look.
@morcuended do you mind re-testing with real data please?

Sure, working on it

@morcuended
Copy link
Member

morcuended commented Nov 20, 2019

These are the tables now

In [10]: get_dataset_keys(file)                                                                                               
Out[10]: 
['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam',
 'instrument/subarray/layout',
 'instrument/subarray/layout.__table_column_meta__',
 'instrument/telescope/camera/LSTCam-002',
 'instrument/telescope/camera/LSTCam-002.__table_column_meta__',
 'instrument/telescope/optics',
 'instrument/telescope/optics.__table_column_meta__']

and they are being filled properly in the test I've done.

So green light from my side for the merging

@vuillaut
Copy link
Member Author

These are the tables now

In [10]: get_dataset_keys(file)                                                                                               
Out[10]: 
['dl1/event/telescope/image/LST_LSTCam',
 'dl1/event/telescope/parameters/LST_LSTCam',
 'instrument/subarray/layout',
 'instrument/subarray/layout.__table_column_meta__',
 'instrument/telescope/camera/LSTCam-002',
 'instrument/telescope/camera/LSTCam-002.__table_column_meta__',
 'instrument/telescope/optics',
 'instrument/telescope/optics.__table_column_meta__']

and they are being filled properly in the test I've done.

So green light from my side for the merging

thank you so much for the tests and review @morcuended

@vuillaut
Copy link
Member Author

So we go on and you @morcuended take over applying pointing modification discussed in #218

@vuillaut vuillaut merged commit 4341c3e into cta-observatory:master Nov 20, 2019
@vuillaut vuillaut deleted the real_data branch November 20, 2019 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review Pull request is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants