Skip to content

simple example task for data analysis within the ALICE framework

Notifications You must be signed in to change notification settings

pzhristov/ALICE_analysis_tutorial

 
 

Repository files navigation

ALICE tutorial session - ‘mini analysis’: centrality dependence of charged pion production at mid-rapidity

1 Step 1

In these exercises, we will use the example task that was explained in the lesson, and use it as a basis to write a ‘mini-analysis’ : centrality dependence of charged pion production at mid-rapidity. Follow each of the steps in this .pdf, and feel free to ask questions at any point!

1.1 Getting started - obtaining the code

Our starting point for these exercises is the example analysis ‘AliAnalysisTaskMyTask.cxx’ that was covered during the lesson. Of course it wouldn’t be very efficient to copy and paste the code for your ‘empty’ analysis task from the slides of the lecture. You can use different methods to obtain it, suggested here in order of adventerousness

  1. The quickest way: download the .tar file that you can find attached to today’s session’s agenda here and extract the files to your local hard drive

  2. As we have learned in the previous days, programmers use version control systems for developing their code. Use what you have learned during our GIT lectures, and clone the example task repository. You can find it on https://github.com/rbertens/ALICE_analysis_tutorial

  3. The third approach is the most robust: if you have a github account, you can make a fork of the example task repository, by surfing to https://github.com/rbertens/ALICE_analysis_tutorial and clicking on fork. If you’ve forked the repository, you can proceed to clone it on your laptop, this allows you to develop the code, and push commits to your own fork

Whichever approach you chose, the end result should be that you have stored these files on your laptop. One word of caution: the path in which you store these files cannot contain spaces! E.g. /home/my awesome task/ will not work, use /home/my_awesome_task/1.

1.2 Getting started - getting the data

To run this task, you will also need to download an input file AliAOD.root, which contains real events from the 2010 Pb–Pb run. You can choose a file from aliensh, but for your convenience you can also download one at https://cernbox.cern.ch/index.php/s/ZP2gJBE265FiSAX .

1.3 A first look at your task

As a start, just take a look at code that makes up the task. You should see the files that were also covered in the first part of the talk:

  • AliAnalysisTaskMyTask.cxx

  • AliAnalysisTaskMyTask.h

  • AddMyTask.C

  • runAnalysis.C

If you cloned the git repository, you’ll also see a ‘.gitignore’ file2. Try to find where all the code snippets that were shown during the presentation fit in, and wonder if you can answer the following questions

  • which function is called for each event?

  • where are the output histogram defined?

  • where are the output histogram filled?

Remember, that for now, the name ‘AliAnalysisTaskMyTask’ doesn’t sound so bad, but when you develop a task ‘in real life’, make sure to give it a meaningful name, that explains what the task is aimed at.

2 Step 2 - Running the task

It’s time to run this simple analysis. If you don’t have AliROOT running on your laptop already, ask for our help and/or team up with one of your fellow students. To run the task, open a terminal, source your aliroot environment (think of the ‘alienv’ commands that Dario explained about yesterday) and type

$ aliroot runAnalysis.C

This launches aliroot, and executes the macro ’runAnalysis.C’. The macro will

  • compile your analysis code

  • Add your analysis task to the analysis manager

  • run the analysis

Make sure though, that the macro ’knows’ where you stored the AliAOD.root file on your laptop (if the AliAOD.root file is in a different folder than the analysis source code, open the runAnalysis.C file, and change the lines where the input file is accessed).

By default, the macro will run the task just on your laptop. Later on, we’ll see how you can run your analysis on GRID.

BEWARE are you using ROOT6 , or running MacOS High Sierra? We try to make the tutorial as compatible as possible with the latest standards, but there might be some surprises - ping us if something doesn’t work.

2.1 Take a look at the output

If all goes well, your simple task runs over the input data, and fills one histogram with the distribution of transverse momentum ($p_{\rm T}$) of all charged particle tracks in all events. Once the task is done running, take a look at the outputfile AnalysisResults.root that has generated. The easiest way to do so is to open a TBrowser (ROOT’s ’graphical user interface’) and take a look at the distribution in the histogram. No idea how to open a TBrowser, or what a TBrowser even is? Don’t hesitate to ask, and take a look at the ROOT user’s guide.

3 Step 3 - time to get active!

If you managed to run the task, congratulations! This is step one towards becoming a professional physicist (no turning back from here ... ) If it didn’t work, ask for help. Also this is a very significant part of becoming a physicist ...

At this point, it’s time to expand our task a bit. What we will do is

  • Add some event selection criteria

  • Add some track selection criteria

But let’s not go too fast.

3.1 Step 1 - the event loop

Collisions are referred to as ‘events’. In our analysis task, the function ‘UserExec’ is called once for each event in the input data set

It’s time to add some code to our task. First, create additional output histograms in which you’ll store

  • The primary vertex positions along the beam line for all events

  • The centrality of all events

Just like the example histogram fHistPt, you have to add these histograms as members of your class. So, create the pointers in the header file, remember to initialize the pointers in your class constructors, and don’t forget to create instances of the histograms in the UserCreateOutput method.

One of the trickiest things in programming can be finding out what methods to use out of millions of lines of code. To help you out, here’s two lines of hints to obtain the vertex position along the beam line, and collision centrality, of our AOD events

Float_t vertexZ = fAOD->GetPrimaryVertex()->GetZ();
Float_t centrality = fAOD->GetCentrality()->GetCentralityPercentile("V0M");

You can of course just copy-paste these lines into your task and hope that it works, but try to think of

  • do you understand the logic of these lines?

  • can you find out in which classes these methods are defined? 3

Try to run the code again, after you’ve added these histograms.

3.2 All set?

It’s almost time to do some physics. Not all collisions in the detector are actually usable for physics analysis. Events which happen very far from the center of the detector, for example, are usually not used for analysis (any idea why?). To make sure that we only look at high quality data, we do ’event selection’. Let’s add some event selection criteria to your task.

  • modify the code such that only primary vertices are accepted within 10 cm of detector center; $|z_{vtx}|<10$cm

Selections in the event sample or track sample are often referred to as ‘cuts’. Usually, we want to see how much data we ‘throw away’ when applying a cut. How would you go about doing that?

Besides rejecting data via cuts, we also make data selection while data taking, by ‘triggering’: selecting collisions in which we expect certain processes to have occurred. You can check which triggers are selected in line 27 of the AddMyTask.C macro - we won’t have time to cover triggers in detail, but if you have questions, ask!

4 Step 4 - The track loop

In this exercise we are going to study the particles that are produced in each event. Out end goal is to identify charged pions, but first we’ll familiarize ourselves with the analysis code some more.

  • plot the $\eta$ and $\varphi$ distributions of the particles for all accepted events in a two-dimensional histogram

  • make sure that the axis have proper ranges (how large is the detector?) and a reasonable amount of bins

If you’ve added this histogram, run the analysis, and look at the two dimensional histogram. Does the distribution make sense?

4.1 Track selection

Just as some events are not of fit to do analysis with, some tracks are also not good for looking at certain observables. In addition to ’event cuts’, we therefore have to apply ’track cuts’ as well. In ALICE, we usually use a system of predefined track selection criteria, that are checked by reading the ’filterbit’ of a track - see the DPG talk !

  • Check in the code where the filterbit selection is done

  • Change the filterbit selection from value 1 to 128 to 512

How does your $\eta$ $\phi$ distribution change, if you run your task again with filterbit 512? What do you think is the cause of the difference? Please ask one of us to explain this in a bit more detail.

TIP

  • If you ran your task again, you have - most likely - overwritten the output of the previous run, which makes it hard to compare the distributions obtained with the different filterbits - it can be useful to keep all output files you generate on a safe place on your hard drive

  • ... if you keep the files make sure that you will remember which event and track selections you used. In a few weeks, you have probably forgotten which filterbit was set for a certain output file. You can create readme files, or add a histogram as bookkeeping device.

5 Step 5 - particle identification using the TPC

In this section, we’ll try to identify particles, by using the amount of energy they have lost per distance traveled in the Time Projection Chamber (TPC). Some specific technical information on how to implement this in your task is given at the end of this document, but general steps to follow are explained here. First, the goals. What we want to do is

  • identify pions

  • check how ’pure’ our pion sample is

image

We’ll do this by looking at the TPC signal, and seeing how well this signal corresponds to the expected signal for a certain particle species. Look at the figure for clarification, you see lines (the expected signal a particle leaves in the TPC, ‘the hypothesis’) and in colors, the actual signals that were deposited.

Start by storing the TPC signals for all charged particles in your events, in a two-dimensional histogram (such as shown in the figure). First follow the ’technical steps’ from Sec. 8, and then try to make such a plot.

Hint, you can get this information by doing

fYour2DHistogram->Fill(track->P(), track->GetTPCsignal());

If your histogram is empty after running, try using Filterbit 1 rather than 128.

As a second step, you identify particles, and store only the TPC signal that corresponds to pions. Use the ’NumberOfSigmasTPC’ functions explained in Sec. 8.

If you are confident that you’ve ’isolated’ the pions, create new histograms to store the pion’s

  • $p_{T}$ spectrum

  • $\eta$

  • $\varphi$

Change the centrality of collisions that you accept: select pions in 0-10% centraliy, and 50-60% centrality. Does the number of pions change in the way you would expect them to change ?

6 STEP 4 - run your code on GRID

6.1 A little test ..

If the analysis code is working, it’s time to launch it on GRID, to see what the results look like when run over a large data set.

  • open the runAnalysis.C macro, and set the flag ’local’ on line 4, to kFALSE

  • keep the flag ’gridTest’ on line 6 set to kTRUE

  • read through the configuration of the AliAnalysisAlien class, do all the settings make sense?

If you think everything is OK, initialize a token ( alien-token-init < username > ) and launch the analysis.

6.2 And then the real thing

If the GRID test was successful, you can launch your analysis on GRID. Set the flat ’gridTest’ on line 6 of the runAnalysis macro to kFALSE, and launch your analysis. Once it is running, you can track the progress at https://alimonitor.cern.ch/users/jobs.jsp.

Once your jobs are all in done, merge the output using the steps outlined on slides xxx of today’s slides.

7 .... if there is time ... (likely not)

7.1 Minimal systematics

As you can see in the figure, the TPC signals of different particle species cross eachother. In these regions, it is difficult (or impossible) to identify particles. Because of this ambiguity, we have to estimate what the uncertainty on our identification routine is.

  • run the main code multiple times, varying the number of standard deviations that you require the signal to be in from 3 to 4, and from 3 to 2

  • how do your spectra change? do these changes make sense to you?

If you are very enthousiastic,

  • try to estimate the systematic uncertainty that particle identification introduces to your measurement

7.2 Make it presentable

By now, you have made quite a few nice plots. Modify the code such that the labels, titles, markers, colors, legends of your plots look decent and presentable for fellow colleagues

  • Did you know that you can use LaTeXin root? this comes in handy when you are labeling axes, etc

  • Store one or two of your pictures. You can store them as ROOT files, macros, or as graphics. Always use vector graphics (.ps, .pdf)!

  • Never be afraid to make titles and axis labels larger, it’s better if things look a bit ’cartoonish’ than when half of the collaboration cannot read what’s on your plots - especially when they are broadcast or projected

In practice, we usually don’t use the TBrowser for ’postprocessing’, but write dedicated code that opens our AnalysisResults.root files, does some final calculations, and makes our plot pretty.

8 Some hints

To create documentation of analysis code, we use a system called ’Doxygen’. You can find explanation about Doxygen at this webpage: https://dberzano.github.io/alice/doxygen/.

The Doxygen documentation of AliROOT and AliPhysics is available here http://alidoc.cern.ch/ . It is generated daily.

9 PID response

To speed things up a bit, the code that you need for excercises 4 and onwards is given here below.

To identify particles, you will add an additional task to your analysis, the ‘PID response’ task. This task makes sure that parametrizations that we use for e.g. specific energy loss in the TPC are loaded. To add this task to your analysis, open your runAnalysis.C macro, and add the following lines. Make sure that these lines are called before your own task is added to that analysis manager, your task will depend on this task:

   // load the necessary macros
   gROOT->LoadMacro("\$ALICE_ROOT/ANALYSIS/macros/AddTaskPIDResponse.C");
   AddTaskPIDResponse();

As a second step, you’ll have to make some changed to the header of our analysis. First, you’ll add another class member, which is a pointer to the AliPIDresponse object that we’ll use in our class. To do this, remember how you added pointers to histograms earlier in this tutorial, so add

  • a forward declaration of the type, i.e. ‘class AliPIDResponse’

  • the pointer itself, i.e. ‘AliPIDResponse* fPIDResponse; //! pid response object’

Don’t forget to add this new pointer fPIDResponse to the member initialization list in your class constructors (in the .cxx files)!

After making changes to the header, you can add the actual identification routine. This will be done in the implementation of the class. You need to go through a few steps:

  • Retrieve the AliPIDResponse object from the analysis manager;

  • make our poiter ‘fPIDResponse’ point to the AliPIDResponse object;

  • include the header for the AliPIDResponse class, so that the compiler knows how it is defined

The code to do this might be a bit difficult to figure out by yourself, so it’s given here below

        AliAnalysisManager *man = AliAnalysisManager::GetAnalysisManager();
        if (man) {
            AliInputEventHandler* inputHandler = (AliInputEventHandler*)(man->GetInputEventHandler());
            if (inputHandler)   fPIDResponse = inputHandler->GetPIDResponse();
        }

Do you understand

  • what happens in each line?

  • in which function (UserCreateOutputObjects, UserExcec, Terminate, etc) to put this code?

Now that we’ve retrieved our PID response object, it’s time to use it. To extract information for different particle species, you can use the functions

    Double_t kaonSignal = fPIDResponse->NumberOfSigmasTPC(track, AliPID::kKaon);
    Double_t pionSignal = fPIDResponse->NumberOfSigmasTPC(track, AliPID::kPion);
    Double_t protonSignal = fPIDResponse->NumberOfSigmasTPC(track, AliPID::kProton);

What these functions return you, is a probability that a particle is of a certain type, expressed in standard deviations ($\sigma$) . To identify a particle, we’ll require that its PIDresponse signal lies withim 3 standard deviations of the expected signal. To check this, you can write a few lines such as

      if(TMath::Abs(fPIDResponse->NumberOfSigmasTPC(track, AliPID::kPion)) < 3 ) {
            // jippy, i'm a pion
      };

  1. If you’re interested why, you can try to introduce spaces and later see what happens when we compile the code.

  2. well, it’s a hidden file ... but it’s there)

  3. Hint: fAOD is a pointer of type AliAODEvent. You can search for the implementation of this class on your laptop, or use the online ‘clickable’ documentation of AliROOT http://alidoc.cern.ch/

About

simple example task for data analysis within the ALICE framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 57.0%
  • C 43.0%