Analysis

Missing things and bugs

logging waith lstchain scripts
run pointings
irf plots

Main Snakefile

# Definitions only, no actual target
include: "definitions.smk"
include: "rules/dl1.smk"
include: "rules/mc.smk"
include: "rules/dl2.smk"
include: "rules/dl3.smk"
include: "rules/dl4.smk"
include: "rules/dl5.smk"

rule all:
    input:
        rules.dl1.input,
        rules.dl2.input,
        rules.mc.input,
        rules.dl3.input,
        rules.dl4.input,
        rules.dl5.input

rule plots:
    input:
       run_selection_plots,
       MC_NODES_IRFs,
       DL3_PLOTS,
       dl4_plots,
       dl5_plots,

Defining Paths etc

From the config files in ./configs, the relevant variables are constructed and defined in ./workflow/definitions.smk. This includes:

Conda enviroments
Paths to configs
Output directories

I kind of duplicate a lot of effort by collecting them here in some dictionaries and the index into them in the subtasks instead of just defining all variables directly, but in my mind this setup makes it easier to reuse the individual stages and reduces mental overhead because the variables are actually in the same file. Preference I guess.

It is also not really “nice” to have these things defined as `Path`-Objects, absolute them and then construct `Path`-Objects again, but I had some issues with how `Snakemake` defines its `cwd` and this fixed it. Not claiming its the best way to do it, but it should be fine honestly. That little overhead is really not holding us back here.

import json
import os
from pathlib import Path

# "Main" paths. Everuthing else is relative to these
scripts_dir = Path("scripts")
env_dir = Path("workflow/envs")
config_dir = Path(config.get("config_dir", "../lst-analysis-config"))
main_config_path = (config_dir / "lst_agn.json").absolute()
build = Path("build") / config_dir.name
analyses = [
    x.name for x in config_dir.iterdir() if x.name.startswith("analysis") and x.is_dir()
]

# Using the one used for the dl1 MCs as I dont reprocess anything
# Best might be to reproduce dl1 mc and data?
# For now this is fine as mcpipe uses the default config from lstchain
# (at least in my test production) and lstchain versions dont differ much in the output
# (hopefully). Thats a fat TODO though...
# lstchain_config = build / "lstchain_config.json"
MATPLOTLIBRC = os.environ.get("MATPLOTLIBRC", config_dir / "matplotlibrc")

with open(main_config_path, "r") as f:
    config_agn = json.load(f)
PRODUCTION = config_agn["production"]
DECLINATION = config_agn["declination"]
train_size = config_agn["train_size"]


# Set all configs
# .absolute(), because I am paranoid. Thats a TODO kinda
# It seems to be, that snakemake changes the current working directory and then local paths fail
# Per Analysis configs are not in here. They have fixed names in the analysis-* dirs
CONFIGS = {
    "agn": (config_dir / "lst_agn.json").absolute(),
    "data_selection": (config_dir / "data_selection.json").absolute(),
    "irf_tool": (config_dir / "irf_tool_config.json").absolute(),
    "lstchain": (config_dir / "lstchain_config.json").absolute(),
}

OUTDIRS = {
    "data_selection": (build / "data_selection").absolute(),
    "mc_nodes": (
        build / "mc_nodes"
    ).absolute(),  # This is not really configurable, its hard coded in the link script
    "mc": (build / "mc").absolute(),
    "models": (build / "models").absolute(),
    "dl1": (build / "dl1").absolute(),
    "dl2": (build / "dl2").absolute(),
    "dl3": (build / "dl3").absolute(),
    "irfs": (build / "irfs").absolute(),
    "dl4": (build / "dl4").absolute(),
    "dl5": (build / "dl5").absolute(),
    "dm": (build / "dm").absolute(),
}

# Set all enviroments
ENVS = {
    "lstchain": config_agn.get("lstchain_enviroment", "lstchain-v0.10.3"),
    "gammapy": (env_dir / "agn-analysis.yml").absolute(),
    "data_selection": (env_dir / "data-selection.yml").absolute(),
    "background": (env_dir / "background.yml").absolute(),
    "dark_matter": (env_dir / "dark_matter.yml").absolute(),
}

SCRIPTS = {
    "data_selection": (scripts_dir / "data_selection").absolute(),
    "mc": (scripts_dir / "mc").absolute(),
    "irfs": (scripts_dir / "irfs").absolute(),
    "dl1": (scripts_dir / "dl1").absolute(),
    "dl2": (scripts_dir / "dl2").absolute(),
    "dl3": (scripts_dir / "dl3").absolute(),
    "dl4": (scripts_dir / "dl4").absolute(),
    "dl5": (scripts_dir / "dl5").absolute(),
    "dm": (scripts_dir / "dm").absolute(),
}


# TODO This is the most critical part as the further evaluation depends on this checkpoint
# Have to make sure this works as expected
def RUN_IDS(wildcards):
    with open(checkpoints.run_ids.get(**wildcards).output.runlist, "r") as f:
        runs = json.load(f)
    return sorted(set(chain(*runs.values())))


def MC_NODES(wildcards):
    exists = Path(checkpoints.link_mc.get(**wildcards).output.dummy).exists()
    mc_nodes = Path(OUTDIRS["mc_nodes"])/ f"{wildcards.particle}"
    nodes = [x.name for x in mc_nodes.glob("*") if x.is_dir()]
    return nodes

def MC_NODES_DL1(wildcards):
    out = Path(OUTDIRS["mc"]) / f"{wildcards.particle}/dl1"
    nodes = MC_NODES(wildcards)
    return [out/f"{node}_{wildcards.train_or_test}.dl1.h5"
            for node in nodes]

def MC_NODES_DL2(wildcards):
    out = Path(OUTDIRS["mc"]) / f"{wildcards.particle}/dl2"
    nodes = MC_NODES(wildcards)
    return [out/f"{node}_{wildcards.train_or_test}.dl2.h5" for node in nodes]

# TODO: cuts are not really IRFs, should separate that.
# Add radmax here if 1D
irfs_to_produce = ["aeff", "gh_cut", "edisp", "psf"]  # TODO script missing
dl3_plot_types = ["theta2", "skymap", "counts_after_cuts", "bkg"]

def MC_NODES_IRFs(wildcards):
    exists = Path(checkpoints.link_mc.get(**wildcards).output.dummy).exists()
    mc_nodes = Path(OUTDIRS["mc_nodes"])/ "GammaDiffuse"
    nodes = [x.name for x in mc_nodes.glob("*") if x.is_dir()]
    out = Path(OUTDIRS["irfs"]) / f"{wildcards.analysis}/plots"
    return [out/f"{irf}/{irf}_{node}.pdf" for node in nodes for irf in irfs_to_produce]

models_to_train = [
    Path(OUTDIRS["models"]) / "reg_energy.sav",
    Path(OUTDIRS["models"]) / "cls_gh.sav",
    Path(OUTDIRS["models"]) / "reg_disp_norm.sav",
    Path(OUTDIRS["models"]) / "cls_disp_sign.sav",
]

def DL2_FILES(wildcards):
    ids = RUN_IDS(wildcards)
    out = Path(OUTDIRS["dl2"])
    return [out / f"LST-1.Run{run_id}.dl2.h5" for run_id in ids]


def DL3_FILES(wildcards):
    ids = RUN_IDS(wildcards)
    out = Path(OUTDIRS["dl3"])
    return [out / f"{wildcards.analysis}/LST-1.Run{run_id}.dl3.fits.gz" for run_id in ids]


def DL3_RUN_PLOTS(wildcards):
    ids = RUN_IDS(wildcards)
    per_run = [dl3 / f"{analysis}/plots/{p}/{p}_{run}.pdf" for p in dl3_plot_types for run in ids for analysis in analyses]
    # bkg does not get stacked (yet?)
    return per_run + [dl3/f"{analysis}/plots/{p}/{p}_stacked.pdf" for p in dl3_plot_types[:-1] for analysis in analyses for plot in dl4_plot_types]



def DL3_IRF_PLOTS(wildcards):
    ids = RUN_IDS(wildcards)
    out = Path(OUTDIRS["dl3"])
    return [out / f"{analysis}/plots/{irf}/{irf}_{run_id}.pdf" for irf in irfs_to_produce for run_id in ids for analysis in analyses for plot in dl4_plot_types]


def DL3_PLOTS(wildcards):
    runs = DL3_RUN_PLOTS(wildcards)
    irfs = DL3_IRF_PLOTS(wildcards)
    compare_plots=[dl3/f"{analysis}/plots/{x}.pdf" for x in ("irfs", "rates") for analysis in analyses]
    return runs + irfs + compare_plots


def IRF_FILES(wildcards):
    ids = RUN_IDS(wildcards)
    out = Path(OUTDIRS["dl3"])
    return [out / f"{wildcards.analysis}/irfs_{run_id}.fits.gz" for run_id in ids for plot in dl4_plot_types]

def BKG_FILES(wildcards):
    ids = RUN_IDS(wildcards)
    out = Path(OUTDIRS["dl3"])
    return [out / f"{wildcards.analysis}/bkg_{run_id}.fits.gz" for run_id in ids for plot in dl4_plot_types]

DL1

This stage is arguably the most comlicated one. On the one hand, I do not even produce the dl1 files, instead using the LSTOSA files, but on the other hand this is where the magic happens as we go from “files somewhere on the cluster” to “nicely organized in the build directory”. At a previous point in time, this was referred to as linking and selecting rather than dl1, but I wanted to have a structure where every stage was more or less one datalevel, because I disliked the “preselection, selection, selecting mcs” naming, that followed from the previous structure.

Note: If there is a need to calculate DL1 as well, this is pretty straightforward here: Just link the DL0 instead and have a rule, that creates dl1 from that similar to the dl1 to dl2 rule. That would then also need to be done for the simulations as well and you probably want to use your own `lstchain`-config instead of linking the one used for the models like its done right now.

Define stuff

Important here (besides having the paths defined): There are some rules, that are really not computationally heavy. It would be a shame to have the slurm overhead for every step here, so they are `localrules`. The linking step creates a dummy file `runs-linked.txt`, that acts as a checkpoint for the later steps.

env = ENVS["data_selection"]
config = CONFIGS["data_selection"]
scripts = Path(SCRIPTS["data_selection"])
out = Path(OUTDIRS["data_selection"])
dl1 = Path(OUTDIRS["dl1"])
plots = out / "plots"


run_selection_plots = [
    plots / f"{name}.pdf"
    for name in ["moon-illumination", "cosmics", "cosmics-above", "run-pointings"]
]


rule dl1:
    input:
        dl1 / "runs-linked.txt",
        run_selection_plots,


localrules:
    runlist,
    select_datasets,
    merge_datachecks,
    run_ids,
    data_check,

There is one manual step required before this point: The runlist has to be downloaded from the lst1 website, which is password-protected. As I want to have this public, I cannot put the credentials here. It is just a simple `curl` command, so not a big deal.

Note: The analysis will not know of new runs until you redownload the runlist. That should not matter most of the times, but keep it in mind!

rule runlist:
    output:
        out / "runlist.html",
    shell:
        """
        echo 'Provide the file {output}. The command is:'
        echo 'curl --user <username>:<password> https://lst1.iac.es/datacheck/lstosa/LST_source_catalog.html -o {output}'
        echo 'You might need to create the output directory first.'
        """

Select relevant runs based on runlist

First of all, we need to select runs observing our source(s) of interest. This is done purely based on the `runlist.html` without any notion of data quality. There is a column `Source Name` in there, which should match the source. A natural expansion here would be to select multiple source names at once if thats useful for the further analysis steps (-> Background model?)

rule select_datasets:
    output:
        out / "runlist.csv",
    input:
        data=out / "runlist.html",
        config=config,
        script=scripts / "select-data.py",
    conda:
        env
    log:
        out=out / "select_datasets.log",
        err=out / "select_datasets.err",
    shell:
        "python {input.script} {input.data} {output} -c {input.config}"

Data quality checks

The next step is discarding runs, that, for one reason or another, do not qualify for further analysis. Luckily, LSTOSA produces datacheck-files, that we can use here. A datacheck-file contains runwise statistics of some quantities commonly used to gauge data quality, for example the rate of cosmic events, which directly translates to some form of efficiency as the rate of hadrons arriving at earth is considered to be constant.

For ease of use, we merge the runwise datachecks first and make the selection on that merged object.

rule merge_datachecks:
    output:
        output=out / "dl1-datachecks-merged.h5",
    input:
        data=out / "runlist.csv",
        script=scripts / "merge-datachecks.py",
    conda:
        env
    log:
        out / "merge_datacheck.log",
    shell:
        "python {input.script} {input.data} {output.output} --log-file {log}"

That selection is based on one of the config files, where e.g. thresholds for the cosmics rate are set. As a result, the rule produces:

A list of selected runs
Cuts and masked datacheck for plots

rule data_check:
    output:
        runlist=out / "runlist-checked.csv",
        datachecks=out / "dl1-datachecks-masked.h5",
        config=out / "dl1-selection-cuts-config.json",
    input:
        runlist=out / "runlist.csv",
        datachecks=out / "dl1-datachecks-merged.h5",
        config=config,
        script=scripts / "data-check.py",
    conda:
        env
    log:
        out / "datacheck.log",
    shell:
        "python \
            {input.script} \
            {input.runlist} \
            {input.datachecks} \
            --config {input.config} \
            --output-runlist {output.runlist} \
            --output-datachecks {output.datachecks} \
            --output-config {output.config} \
            --log-file {log}"

Define runs to be used for the analysis

With the list of runs from nameref:data_check, we can go ahead and link the runs from the global data-directory to our build-directory. This simplifies rules massively. Before doing that, we first convert the runlist. That is pretty arbitrary and could also be done in a single step. I do not know exactly why we did it this way, but it works, right?

checkpoint run_ids:
    output:
        runlist=out / "runs.json",
    input:
        data=out / "runlist-checked.csv",
        config=config,
        script=scripts / "create-night-run-list.py",
    conda:
        env
    log:
        out / "check_runlist.log",
    shell:
        "python \
        {input.script} \
        {input.data} \
        {output} \
        -c {input.config} \
        --log-file {log}"

Now we arrive at the first checkpoint! It is important to have this as a checkpoint, because a priori you do not know which runs you select for the analysis. It is known a few steps before this to be exact, but since this is the last (run-)linking related rule, I decided to make this the checkpoint. From now on out, all run ids are known just by looking into the `<build_dir>/dl1` folder.

checkpoint link_runs:
    output:
        dl1 / "runs-linked.txt",
    input:
        runs=out / "runs.json",
        datacheck=out / "dl1-datachecks-masked.h5",
        script=scripts / "link-runs.py",
    params:
        dl1=dl1,
    conda:
        env
    log:
        out / "link_runs.log"
    shell:
        "python \
        {input.script} \
        --runs {input.runs} \
        --dl1-link-dir {params.dl1} \
        --log-file {log} \
        --output-path {output}"

Plots

Plotting the data-selection part is very easy. Since multiple plots can be constructed from the output of the datacheck-rule, there is just one rule to handle these and the script-name is constructed from the wildcard name, which is the name of the output plot. It could also be multiple rules as not all of them need all of the input files, but this is how we constructed it a while back for the 1D-analysis.

rule plot_data_selection:
    output:
        plots / "{name}.pdf",
    input:
        data=out / "dl1-datachecks-masked.h5",
        config=out / "dl1-selection-cuts-config.json",
        script=scripts / "plot-{name}.py",
    conda:
        env
    log:
        plots / "{name}.log",
    shell:
        "python \
        {input.script} \
        {input.data} \
        -c {input.config} \
        -o {output} \
        --log-file {log} "

For the run pointings, a new file containing just these is constructed. This is actually not a big step and could be done in the plot script aswell, but I like having the csv file with the pointing directions easily accesible.

rule gather_run_pointings:
    output:
        out / "run-pointings.csv",
    input:
        runs=out / "runs.json",
        datacheck=out / "dl1-datachecks-masked.h5",
        script=scripts / "gather-run-pointings.py",
    conda:
        env
    log:
        out / "run_pointings.log",
    shell:
        "python {input.script} \
        --runs {input.runs} \
        --runsummary {input.datacheck} \
        --output {output} \
        --log-file {log} "


rule plot_run_pointings:
    output:
        plots / "run-pointings.pdf",
    input:
        pointings=out / "run-pointings.csv",
        script=scripts / "plot-run-pointings.py",
    conda:
        env
    log:
        plots / "run_pointings.log",
    shell:
        "python {input.script} \
        --input {input.pointings} \
        --output {output} \
        --log-file {log} "

MC

Variables

Produce the merged test files as well for some plots of the gammaness distributions

lstchain_env = ENVS["lstchain"]
link_env = ENVS["data_selection"]
plot_env = ENVS["gammapy"]
scripts = Path(SCRIPTS["mc"])
mc = Path(OUTDIRS["mc"])
models = mc / "models"

# Need some extra dirs
mc_nodes = Path(OUTDIRS["mc_nodes"])
dl1 = Path(OUTDIRS["dl1"])
models = Path(OUTDIRS["models"])
config = CONFIGS["lstchain"]

plots = mc / "plots"


rule mc:
    input:
        link=mc / "mc-linked.txt",
        models=models_to_train,
        plots=(plots/"model_performance.pdf"),

localrules:
    link_mc,

Link nodes

checkpoint link_mc:
    output:
        dummy=mc / "mc-linked.txt",
        config=mc / "lstchain_mcpipe.json",
    input:
        script=scripts / "link-mc.py",
    params:
        production=PRODUCTION,
        declination=DECLINATION,
        mc_nodes=mc_nodes,
    conda:
        link_env
    log:
        mc / "link_mc.log"
    shell:
        "python \
        {input.script} \
        --prod {params.production} \
        --dec {params.declination} \
        --mc-nodes-link-dir {params.mc_nodes} \
        --model-config-link-path {output.config} \
        --log-file {log} \
        --verbose \
        --output-path {output.dummy}"

Create train and test files per node

First of all, the individual runs of a single allsky node need to be merged. After this step there will 2 (train+test) diffuse gamma files per node.

rule merge_gamma_mc_per_node:
    output:
        train=mc / "GammaDiffuse/dl1/{node}_train.dl1.h5",
        test=mc / "GammaDiffuse/dl1/{node}_test.dl1.h5",
    input:
        dummy=mc / "mc-linked.txt",
        script=scripts/"merge_mc_nodes.py",
    params:
        train_size=train_size,
        directory=lambda wildcards: mc_nodes / f"GammaDiffuse/{wildcards.node}",
    conda:
        lstchain_env
    log:
        mc / "GammaDiffuse/dl1/merge_gamma_mc_{node}.log",
    shell:
        "python {input.script} \
        --input-dir {params.directory} \
        --train-size {params.train_size} \
        --output-train {output.train} \
        --output-test {output.test} \
        --pattern 'dl1_*.h5' \
        --log-file {log}"

rule merge_proton_mc_per_node:
    output:
        train=mc / "Protons/dl1/{node}_train.dl1.h5",
        test=mc / "Protons/dl1/{node}_test.dl1.h5",
    input:
        dummy=mc / "mc-linked.txt",
        script=scripts/"merge_mc_nodes.py",
    params:
        train_size=0.9,
        directory=lambda wildcards: mc_nodes / f"Protons/{wildcards.node}",
    conda:
        lstchain_env
    log:
        mc / "Protons/dl1/merge_proton_mc_{node}.log",
    shell:
        "python {input.script} \
        --input-dir {params.directory} \
        --train-size {params.train_size} \
        --output-train {output.train} \
        --output-test {output.test} \
        --pattern 'dl1_*.h5' \
        --log-file {log}"

Train models

There is only one set of models for the whole trajectory and not one for each node in order to make better use of the training statistic.

rule merge_train_or_test_of_all_nodes:
    output:
        mc / "{particle}/dl1/{particle}_{train_or_test}_merged.dl1.h5",
    input:
        nodes=MC_NODES_DL1,
        script=scripts/"merge_mc_nodes.py",
    params:
        directory=lambda wildcards: mc / f"{wildcards.particle}/dl1",
        pattern=lambda wildcards: f"*_{wildcards.train_or_test}.dl1.h5",
        out_type=lambda wildcards: f"output-{wildcards.train_or_test}",
    conda:
        lstchain_env
    log:
        mc / "{particle}/merge_all_{particle}_{train_or_test}.log",
    shell:
        """
        python {input.script} \
        --input-dir {params.directory} \
        --pattern {params.pattern} \
        --{params.out_type} {output} \
        --log-file {log}
        """

For the training it is just the lstchain script. That requires a lot of resources, because they load all of the data into RAM at once…

rule train_models:
    output:
        models_to_train,
    input:
        gamma=mc / "GammaDiffuse/dl1/GammaDiffuse_train_merged.dl1.h5",
        proton=mc / "Protons/dl1/Protons_train_merged.dl1.h5",
        config=config,
    resources:
        mem_mb=64000,
        cpus=8,
        partition="long",
        time=1200,
    conda:
        lstchain_env
    log:
        models / "train_models.log",
    shell:
        """
        lstchain_mc_trainpipe \
        --fg {input.gamma} \
        --fp {input.proton} \
        --config {input.config} \
        --output-dir {models} > {log}
        """

Plot model performance

rule plot_rf_performance:
    output:
        plots / "model_performance.pdf"
    input:
        models=models_to_train,
        gamma=mc / "GammaDiffuse/dl2/GammaDiffuse_test_merged.dl2.h5",
        proton=mc / "Protons/dl2/Protons_test_merged.dl2.h5",
        config=config,
        script=scripts/"plot_rf_performance.py"
    resources:
        mem_mb=64000,
    params:
        modeldir=models,
    conda:
        lstchain_env
    log:
        models / "plot_models.log",
    shell:
        """
        python {input.script} \
        --gammas {input.gamma} \
        --protons {input.proton} \
        --config {input.config} \
        --model-dir {params.modeldir} \
        --output {output} \
        --log-file {log}
        """

DL2

This stage contains:

dl1 to dl2 for data
dl1 to dl2 for test mc
calculate IRFs (at MC nodes!)
Plot IRFs

It is not very complicated, because the annoying work of linking and organizing files has been done before.

Definition

The target is actually super simple here: All IRFs and all dl2 files.

These are defined in variables and include the linking checkpoints.

lstchain_env = ENVS["lstchain"]
plot_env = ENVS["gammapy"]
scripts = Path(SCRIPTS["dl2"])
irfs = Path(OUTDIRS["irfs"])
irf_scripts = Path(SCRIPTS["irfs"])
mc = Path(OUTDIRS["mc"])
dl1 = Path(OUTDIRS["dl1"])
dl2 = Path(OUTDIRS["dl2"])
dl2_scripts = Path(SCRIPTS["dl2"])
models = Path(OUTDIRS["models"])
config = CONFIGS["lstchain"]

rule dl2:
    input:
        runs=DL2_FILES,

Create DL2

Here we just use the lstchain script with the newly created models. The `{somepath}` wildcard makes it so that this rule works for both the observed runs and the MC. Organizing files starts to pay off! Need a dummy input though to make sure files have been linked already

merged mc is a bit different. Problem is we dont have the dl1 data files as target anywhere, just the link :(

rule dl1_to_dl2_mc:
    output:
        mc / "{particle}/dl2/{base}.dl2.h5",
    input:
        config=config,
        models=models_to_train,
        mc_exists=mc / "mc-linked.txt",
        data=mc / "{particle}/dl1/{base}.dl1.h5",
    conda:
        lstchain_env
    resources:
        mem_mb=64000,
        cpus=4,
    log:
        mc/"{particle}/dl2/dl1_to_dl2_{base}.log"
    shell:
        """
        lstchain_dl1_to_dl2  \
            --input-file {input.data}  \
            --output-dir $(dirname {output}) \
            --path-models {models}  \
            --config {input.config}
        """


rule dl1_to_dl2_data:
    output:
        dl2 / "{base}.dl2.h5",
    input:
        config=config,
        models=models_to_train,
        dl1_exists=dl1 / "runs-linked.txt",
    params:
        data =lambda wc: dl1 / f"{wc.get('base')}.dl1.h5",
    conda:
        lstchain_env
    resources:
        mem_mb=64000,
        cpus=4,
    log:
        dl2/"dl1_to_dl2_{base}.log"
    shell:
        """
        lstchain_dl1_to_dl2  \
            --input-file {params.data}  \
            --output-dir $(dirname {output}) \
            --path-models {models}  \
            --config {input.config}
        """

IRFs

Calculating IRFs is an easy `lstchain` task. Note that these mc nodes are now not connected to the data at all (opposed to how it was done in the old 1D-analysis). This gives us more flexibility with regard to how that matching should be done and is more in line with the `lstchain 0.10` way of doing things, where the dl3 step gets all IRFs every time. That is needed to interpolate the IRFs, but works the same way when selecting the nearest one. In older `lstchain`-versions you would have to match by hand and give the script a single set of IRFs for a single observed run.

Another noteworthy thing here is, that we include the energy dispersion fix for pyirf. This was discussed in the lst-analysis call from 2023-08-28 and on the ICRC before that. Basically `pirf` produced wrongly normalized energy dispersion files, which actually seems to have an impact on higher level analyses. The script comes directly from the pyirf release (https://github.com/cta-observatory/pyirf/releases/tag/v0.10.0).

rule irf:
    output:
        irfs / "{analysis}/irfs_{node}.fits.gz",
    input:
        gammas=mc / "GammaDiffuse/dl2/{node}_test.dl2.h5",
        config=config_dir / "{analysis}/irf_tool_config.json",
        edisp_script=irf_scripts / "fix_edisp.py",
    conda:
        lstchain_env
    resources:
        mem_mb=8000,
        time=10,
    log:
        irfs / "{analysis}/irfs_{node}.log"
    shell:
        """
        lstchain_create_irf_files \
            -o {output} \
            -g {input.gammas} \
            --config {input.config}

        python {input.edisp_script} {output}
        """

Plotting is a single rule although every IRF is saved individually. This works by naming the plot scripts in a predictable way of `plot_irf.py` and have them all behave the same way w.r.t. cli arguments. The wildcard constraint makes it possible to have two wildcards in the output filename. Otherwise `irf` could match on e.g. `aeff_node_xzy_…` and `base` would only match on the last part. Maybe one could also change the behaviour of the regex matching there, but I think this solution is pretty nice.

rule plot_irf:
    output:
        "{somepath}/{analysis}/plots/{irf}/{irf}_{base}.pdf",
    input:
        data="{somepath}/{analysis}/irfs_{base}.fits.gz",
        script=irf_scripts / "plot_irf_{irf}.py",
        rc=MATPLOTLIBRC,
    conda:
        plot_env
    resources:
        mem_mb=1000,
        time=20,
    wildcard_constraints:
        irf="|".join(irfs_to_produce)
    log:
        "{somepath}/{analysis}/plots/{irf}/{irf}_{base}.log"
    shell:
        "MATPLOTLIBRC={input.rc} \
        python {input.script} \
        -i {input.data} \
        -o {output} \
        --log-file {log}"

DL3

This stage is rather complex. On the one hand, creating dl3 files from dl2 and IRFs is not difficult at all with the existing `lstchain` infrastructure. There are a lot of plots created here and skymaps are calculated, but that is not complicated, just a lot of rules. On the other hand, we need a background for the dl4 datasets. That is not a default `lstchain` task and needs to be done by hand. There is a `pybkgmodel` based script, that produces a background for every observation and then a small helper script to add the background “IRF” to the hdu-index.

Definitions

Nothing crazy here. Note, that the `irfs` path usually relates to the grid IRFs and the “final” IRFs, that are linked in the dl3 files, lie in the dl3 path.

Need to define the runwise plots extra here, because exapdn does not seem to work with the RUN_IDS function (remember its not just a variable due to the checkpoint thing)

lstchain_env = ENVS["lstchain"]
bkg_env = ENVS["background"]
gammapy_env = ENVS["gammapy"]

dl2 = Path(OUTDIRS["dl2"])
dl3 = Path(OUTDIRS["dl3"])
irfs = Path(OUTDIRS["irfs"])

scripts = Path(SCRIPTS["dl3"])
data_selection_config = CONFIGS["data_selection"]


rule dl3:
    input:
        index=[dl3 / f"{analysis}/hdu-index.fits.gz" for analysis in analyses],
        bkg=[dl3/f"{analysis}/bkg-exists" for analysis in analyses],
        plots = DL3_PLOTS,

Create DL3 Data

Individual runs

Thats just lstchain here. Using the (new at the time) interface of 0.10.x, this is incompatible with previous versions. I could make an attempt to support that as well, but I do not see why I would use previous versions again, so might as well force it.

rule dl2_to_dl3:
    output:
        run=dl3 / "{analysis}/LST-1.Run{run_id}.dl3.fits.gz",
    input:
        data=dl2 / "LST-1.Run{run_id}.dl2.h5",
        irfs=MC_NODES_IRFs, 
        config=config_dir / "{analysis}/irf_tool_config.json",
        fix_script = scripts / "patch_dl3_header.py"
    params:
        irf_pattern='irfs_*.fits.gz',
        out=lambda wc: dl3 / wc.get("analysis"),
        in_irfs=lambda wc: irfs / wc.get("analysis"),
    conda:
        lstchain_env
    resources:
        mem_mb=12000,
        time=30,
    log:
        dl3 / "{analysis}/create_dl3_{run_id}.log"
    shell:
        """
        lstchain_create_dl3_file  \
            --input-dl2 {input.data}  \
            --output-dl3-path {params.out}  \
            --input-irf-path {params.in_irfs}  \
            --irf-file-pattern {params.irf_pattern} \
            --config {input.config} \
            --gzip \
            --use-nearest-irf-node \
            --overwrite \
            --log-file {log}

        python {input.fix_script} --dl3-file {output.run}
        """

Background

After some back and forth with pybkgmodel, I decided to write my own script. Its content is subject to change, but some design goals will probably stay the same, so the workflow will not change much (hopefully):

Runwise background. There can be identical models, but every run gets a background model file with the run_id included. Not changing this around when eventually producing stacked models makes the dl3 construction simpler.
One call to the script creates all backgrounds. This is opposed to the one to one matching used throughout the workflow, but it makes me more flexible w.r.t script development. All runs are available at once meaning I can match and group them however I want. I could still read all runs every time as it is not THAT expensive (main part being the transformations of the event list and not I/O of the datastore and metadata), but this way you could e.g. create N background models for N zenith bins and use them for all runs in the bin thus effectively constructing less models.

There is a basic plot script here as well for visual inspection of things like radial symmetry and empty bins.

rule calc_count_maps:
    output:
        dl3 / "{analysis}/bkg_cached_maps.pkl"
    input:
        runs=DL3_FILES,
        config=config_dir / "{analysis}/bkgmodel.yml",
        script=scripts/"precompute_background_maps.py",
        bkg_exclusion_regions=config_dir/"{analysis}/bkg_exclusion",
    conda:
        bkg_env
    resources:
        partition="long",
        time=360,
        mem_mb=32000,
    log:
        dl3 / "{analysis}/calc_count_maps.log",
    shell:
        """python {input.script} \
        --input-runs {input.runs} \
        --exclusion {input.bkg_exclusion_regions} \
        --output {output} \
        --config {input.config} \
        --log-file {log} \
        --verbose \
        --overwrite
        """

rule calc_background:
    output:
        dummy=dl3/"{analysis}/bkg-exists",
    input:
        runs=DL3_FILES,
        config=config_dir / "{analysis}/bkgmodel.yml",
        script=scripts/"calc_background.py",
        cached_maps=dl3/"{analysis}/bkg_cached_maps.pkl",
        bkg_exclusion_regions=config_dir/"{analysis}/bkg_exclusion",
    params:
        bkg_dir=lambda wc: dl3/wc.get("analysis"),
    conda:
        bkg_env
    resources:
        partition="short",
    log:
        dl3 / "{analysis}/calc_bkg.log",
    shell:
        """python {input.script} \
        --input-runs {input.runs} \
        --output-dir {params.bkg_dir} \
        --exclusion {input.bkg_exclusion_regions} \
        --dummy-output {output.dummy} \
        --cached-maps {input.cached_maps} \
        --config {input.config} \
        --log-file {log} \
        --verbose \
        --overwrite
        """

Index

The actual work consists of calling the lstchain script and then adding new rows to the table, that link to the background files. The script is very basic and assumes, that you can sort the background files and get the same order as when sorting the observations. That is true as long as the name of the background file contains the run_id and there are no shenanigans with the leading 0.

def DL3_INDEX_FILELIST(wildcards):
    files = DL3_FILES(wildcards)
    return "--file-list " + " --file-list ".join([f.name for f in files]),

rule dl3_hdu_index:
    output:
        dl3 / "{analysis}/hdu-index.fits.gz",
    input:
        runs=DL3_FILES,
        index_script=scripts / "create_hdu_index.py",
        link_script=scripts / "link_bkg.py",
        dummy=dl3/"{analysis}/bkg-exists",
    params:
        outdir=lambda wc: dl3/wc.get("analysis"),
        bkg=BKG_FILES,
        filelist=DL3_INDEX_FILELIST,
    conda:
        lstchain_env
    log:
        dl3 / "{analysis}/hdu_index.log",
    shell:
        """
        python {input.index_script}  \
            --input-dl3-dir {params.outdir}  \
            --output-index-path {params.outdir}  \
            {params.filelist} \
            --overwrite \
            --log-file {log}

        python {input.link_script} \
        --hdu-index-path {output} \
        --bkg-files {params.bkg} \
        """

Plot dl3 stuff

rates

rule plot_dl3_rates:
    output:
        dl3 / "{analysis}/plots/rates.pdf",
    input:
        index=dl3 / "{analysis}/hdu-index.fits.gz",
        script=scripts/"plot_rates.py",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        plots / "{analysis}/rates.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {input.index} -o {output} --log-file {log}"

irf_comparison

rule plot_dl3_irf_comparison:
    output:
        dl3 / "{analysis}/plots/irfs.pdf",
    input:
        index=dl3 / "{analysis}/hdu-index.fits.gz",
        script=scripts/"plot_irf_comparison.py",
        config=config_dir / "{analysis}/analysis.yaml",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/plots/irfs.log",
    shell:
        """MATPLOTLIBRC={input.rc} python {input.script} \
        -i {input.index} \
        -o {output} \
        --config {input.config} \
        --log-file {log}
        """

Theta 2

rule calc_theta2_per_obs:
    output:
        dl3 / "{analysis}/theta2/{run_id}.fits.gz",
    input:
        data=dl3 / "{analysis}/LST-1.Run{run_id}.dl3.fits.gz",
        script=scripts/"calc_theta2_per_obs.py",
        config=data_selection_config, # this seems unnecessary
        index=dl3 / "{analysis}/hdu-index.fits.gz",
    params:
        outdir=lambda wc: dl3/wc.get("analysis"),
    wildcard_constraints:
        run_id="\d+",  # dont match on "stacked".
    resources:
        mem_mb=16000,
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/theta2/calc_{run_id}.log",
    shell:
        "python {input.script} -i {params.outdir} -o {output} --obs-id {wildcards.run_id} --config {input.config} --log-file {log}"

def dl3_all_theta_tables(wildcards):
    ids = RUN_IDS(wildcards)
    return [dl3 / f"{wildcards.analysis}/theta2/{run}.fits.gz" for run in ids]

rule stack_theta2:
    output:
        dl3 / "{analysis}/theta2/stacked.fits.gz",
    input:
        runs=dl3_all_theta_tables,
        script=scripts/"stack_theta2.py",
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/theta2/theta2_stacked.log",
    shell:
        "python {input.script} -o {output} --input-files {input.runs} --log-file {log}"


rule plot_theta:
    output:
        dl3 / "{analysis}/plots/theta2/theta2_{run_id}.pdf",
    input:
        data=dl3 / "{analysis}/theta2/{run_id}.fits.gz",
        script=scripts/"plot_theta2.py",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/plots/theta2/plot_{run_id}.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {input.data} -o {output} --log-file {log}"

Background

super suboptimal due to the bkg thing

rule plot_background:
    output:
        dl3 / "{analysis}/plots/bkg/bkg_{run_id}.pdf"
    input:
        data=dl3 / "{analysis}/bkg-exists",
        script=scripts / "plot_bkg.py",
        rc=MATPLOTLIBRC,
    params:
        data=lambda wildcards: dl3 / f"{wildcards.analysis}/bkg_{wildcards.run_id}.fits.gz",
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/plots/bkg/bkg_{run_id}.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {params.data} -o {output}"

Skymaps

rule calc_skymap:
    output:
        dl3 / "{analysis}/skymap/{run_id}.fits.gz",
    input:
        data=dl3 / "{analysis}/LST-1.Run{run_id}.dl3.fits.gz",
        script=scripts/"calc_skymap_gammas.py",
        config=config_dir / "{analysis}/irf_tool_config.json",
        index=dl3 / "{analysis}/hdu-index.fits.gz",
    wildcard_constraints:
        run_id="\d+",  # dont match on "stacked".
    params:
        outdir=lambda wc: dl3/wc.get("analysis"),
        n_bins=50,
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/skymap/calc_{run_id}.log",
    shell:
        "python {input.script} -i {params.outdir} -o {output} --obs-id {wildcards.run_id} --config {input.config} --log-file {log} --n-bins {params.n_bins}"


def dl3_all_skymaps(wildcards):
    ids = RUN_IDS(wildcards)
    return [dl3 / f"{wildcards.analysis}/skymap/{run}.fits.gz" for run in ids]


rule stack_skymaps:
    output:
        dl3 / "{analysis}/skymap/stacked.fits.gz",
    input:
        data=dl3_all_skymaps,
        script=scripts/"stack_skymap.py",
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/skymap/stack.log",
    shell:
        "python {input.script} -i {input.data} -o {output} --log-file {log}"


rule plot_skymap:
    output:
        dl3 / "{analysis}/plots/skymap/skymap_{run_id}.pdf",
    input:
        data=dl3 / "{analysis}/skymap/{run_id}.fits.gz",
        script=scripts/"plot_skymap.py",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        dl3 / "{analysis}/plots/skymap/plot_{run_id}.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {input.data} -o {output} --log-file {log}"

Cuts dl3 (kinda irf)

rule cuts_dl2_dl3:
    output:
        dl3 / "{analysis}/counts_after_cuts/{run_id}.h5",
    input:
        dl2=dl2/ "LST-1.Run{run_id}.dl2.h5",
        irf=dl3 / "{analysis}/LST-1.Run{run_id}.dl3.fits.gz",
        config=config_dir / "{analysis}/irf_tool_config.json",
        lstchain_config=config_dir / "lstchain_config.json",
        script=scripts/"calc_counts_after_cuts.py",
    wildcard_constraints:
        run_id="\d+",  # dont match on "stacked".
    resources:
        mem_mb="64G",
    conda:
       lstchain_env
    log:
        dl3 / "{analysis}/counts_after_cuts/calc_{run_id}.log",
    shell:
        "python {input.script} --input-dl2 {input.dl2} --input-irf {input.irf} --lstchain-config {input.lstchain_config} -c {input.config} -o {output} --log-file {log}"

def dl3_all_counts(wildcards):
    ids = RUN_IDS(wildcards)
    return [dl3 / f"{wildcards.analysis}/counts_after_cuts/{run}.h5" for run in ids]

rule stack_cuts_dl2_dl3:
    output:
        dl3/ "{analysis}/counts_after_cuts/stacked.h5",
    input:
        data=dl3_all_counts,
        script=scripts/"stack_counts_after_cuts.py",
        rc=MATPLOTLIBRC,
    conda:
        lstchain_env
    log:
        dl3 / "{analysis}/counts_after_cuts/stack.log", # TODO use this
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {input.data} -o {output}"

rule plot_cuts_dl2_dl3:
    output:
        dl3 / "{analysis}/plots/counts_after_cuts/counts_after_cuts_{run_id}.pdf",
    input:
        data=dl3/ "{analysis}/counts_after_cuts/{run_id}.h5",
        script=scripts/"plot_counts_after_cuts.py",
        rc=MATPLOTLIBRC,
    conda:
        lstchain_env
    log:
        dl3 / "{analysis}/dl3/counts_after_cuts/plot_counts_{run_id}.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {input.data} -o {output} --log-file {log}"

plot dl3 irfs

rule plot_run_irf:
    output:
        dl3/"{analysis}/plots/{irf}/{irf}_{run_id}.pdf",
    input:
        data=dl3 / "{analysis}/LST-1.Run{run_id}.dl3.fits.gz",
        script=irf_scripts / "plot_irf_{irf}.py",
        rc=MATPLOTLIBRC,
    conda:
        plot_env
    resources:
        mem_mb=1000,
        time=20,
    wildcard_constraints:
        irf="|".join(irfs_to_produce)
    log:
        dl3/"{analysis}/plots/{irf}/{irf}_{run_id}.log"
    shell:
        "MATPLOTLIBRC={input.rc} \
        python {input.script} \
        -i {input.data} \
        -o {output} \
        --log-file {log}"

DL4

Definitions

gammapy_env = ENVS["gammapy"]
dl3 = Path(OUTDIRS["dl3"])
dl4 = Path(OUTDIRS["dl4"])
scripts = Path(SCRIPTS["dl4"])

dl4_plot_types = ["dataset_peek"]#, "dl4_diagnostics"]
dl4_plots = [dl4 / f"{analysis}/plots/{plot}.pdf" for analysis in analyses for plot in dl4_plot_types]

rule dl4:
    input:
        dl4_plots

Create MapDataset

rule create_fov_bkg_exclusion:
    output:
        dl4 / "{analysis}/bkg_exclusion.fits.gz"
    input:
        region=config_dir / "{analysis}/bkg_exclusion",
        script=scripts/"create_fits_exclusion.py",
        config=config_dir / "{analysis}/analysis.yaml",
    conda:
        gammapy_env
    log:
        dl4 / "{analysis}/create_exclusion.log",
    shell:
        "python {input.script}  -i {input.region} -o {output} --log-file {log} -c {input.config} --verbose"

rule create_dataset:
    output:
        datasets=dl4/ "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
    input:
        data=dl3 / "{analysis}/hdu-index.fits.gz",
        config=config_dir / "{analysis}/analysis.yaml",
        script=scripts/"write_datasets_3d_manual.py",
        bkg_exclusion_regions=dl4 / "{analysis}/bkg_exclusion.fits.gz"
    conda:
        gammapy_env
    resources:
        cpus=1,
        mem_mb=32000,
    log:
        dl4/ "{analysis}/datasets.log",
    shell:
        "python {input.script} -c {input.config}  -o {output.datasets} -m {output.bkg_fit} --log-file {log} --n-jobs {resources.cpus} --verbose"

Plot DL4 statistics

rule calc_dl4_diagnostics:
    output:
        dl4/ "{analysis}/dl4_diagnostics.fits.gz",
    input:
        data=dl4/"{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        config=config_dir / "{analysis}/analysis.yaml",
        script=scripts/"calc_dl4_diagnostics.py",
    resources:
        mem_mb=16000,
    conda:
        gammapy_env
    log:
        dl4/ "{analysis}/dl4_diagnostics.log",
    shell:
        "python {input.script} -c {input.config} -o {output} --datasets-path {input.data} --models-path {input.bkg_fit} --log-file {log}"


rule peek_datasets:
    output:
        dl4/"{analysis}/plots/dataset_peek.pdf",
    input:
        data=dl4/ "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        script=scripts/"plot_dataset_peek.py",
        config=config_dir / "{analysis}/analysis.yaml",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        dl4 / "{analysis}/plots/dataset_peek.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -c {input.config} -o {output} --datasets-path {input.data} --models-path {input.bkg_fit} --log-file {log}"


rule plot_dl4_dianotics:
    output:
        dl4 / "{analysis}/plots/dl4_diagnostics.pdf",
    input:
        data=dl4/"{analysis}/dl4_diagnostics.fits.gz",
        script=scripts/"plot_dl4_diagnostics.py",
        rc=os.environ.get("MATPLOTLIBRC", config_dir / "matplotlibrc"),
    conda:
        gammapy_env
    log:
        dl4 / "{analysis}/plots/dl4_diagnostics.log",
    shell:
        "MATPLOTLIBRC={input.rc} python {input.script} -i {input.data} -o {output} --log-file {log}"

DL5

Because most of these steps are a simple calc something -> plot results, I have not put the plot rules separate here, but instead organised it by the values to fit/compute.

Define stuff

gammapy_env = ENVS["gammapy"]
dl4 = Path(OUTDIRS["dl4"])
dl5 = Path(OUTDIRS["dl5"])
scripts = Path(SCRIPTS["dl5"])

dl5_plot_types = ["ts_significance_map", "ts_significance_distribution", "excess_significance_map", "excess_significance_distribution"]#, "2d_flux_profile", "fit_residuals"]
#_curve] significance distribution on vs off

dl5_plots = [dl5 / f"{analysis}/plots/{plot}.pdf" for analysis in analyses for plot in dl5_plot_types]

rule dl5:
    input:
        dl5_plots

2D Flux profile

https://docs.gammapy.org/1.1/tutorials/analysis-3d/flux_profiles.html

rule calc_2d_flux_profile:
    output:
        dl5 / "{analysis}/2d_flux_profile.fits.gz",
    input:
        data=dl4 / "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        script=scripts/"calc_2d_flux_profile.py",
    conda:
        gammapy_env
    log:
        dl5/ "{analysis}/calc_2d_flux_profile.log",
    shell:
        """
        python {input.script} \
        --datasets-path {input.data} \
        --models-path {input.bkg_fit} \
        --output {output} \
        --log-file {log}
        """

rule plot_2d_flux_profile:
    output:
        dl5 / "{analysis}/plots/2d_flux_profile.pdf",
    input:
        flux_points=dl5 / "{analysis}/2d_flux_profile.fits.gz",
        script=scripts/"plot_2d_flux_profile.py",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        dl5 / "{analysis}/plots/plot_2d_flux_profile.log",
    shell:
        """
        MATPLOTLIBRC={input.rc} \
        python {input.script} \
        --flux-points {input.flux_points} \
        --output {output} \
        --log-file {log}
        """

Fit Skymodel(s)

https://docs.gammapy.org/1.1/tutorials/analysis-3d/analysis_3d.html

model für die ghostbuster?

rule model_best_fit:
    output:
        dl5 / "{analysis}/model-best-fit.yaml",
    input:
        config=config_dir / "{analysis}/analysis.yaml",
        dataset=dl4 / "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        model=config_dir / "{analysis}/models.yaml",
        script=scripts/"fit-model.py",
    conda:
        gammapy_env
    log:
        dl5 / "{analysis}/model_best_fit.log",
    resources:
        partition="long",
        time=1200,
    shell:
        """
        python {input.script} \
            -c {input.config} \
            --datasets-path {input.dataset} \
            --bkg-models-path {input.bkg_fit} \
            --model-config {input.model} \
            -o {output} \
        """

rule plot_residual_map:
    output:
        dl5/ "{analysis}/plots/fit_residuals.pdf",
    input:
        data=dl4/ "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        model=dl5/ "{analysis}/model-best-fit.yaml",
        config=config_dir / "{analysis}/analysis.yaml",
        script=scripts/"plot_residuals.py",
    conda:
        gammapy_env
    shell:
        """
        python {input.script} \
            -c {input.config} \
            --datasets-path {input.data} \
            --bkg-models-path {input.bkg_fit} \
            --best-model-path {input.model} \
            -o {output}
        """

Significance map

rule calc_significance_map:
    output:
        dl5 / "{analysis}/ts_significance_map.pkl",
    input:
        data=dl4 / "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        script=scripts/"calc_significance_map.py",
    conda:
        gammapy_env
    log:
        dl5 / "{analysis}/calc_significance_map.log",
    shell:
        """
        python {input.script} \
        --datasets-path {input.data} \
        --models-path {input.bkg_fit} \
        --output {output} \
        --log-file {log}
        """

rule calc_excess_map:
    output:
        dl5 / "{analysis}/excess_significance_map.pkl",
    input:
        data=dl4 / "{analysis}/datasets.fits.gz",
        bkg_fit=dl4/"{analysis}/bkg_fit.yaml",
        script=scripts/"calc_excess_map.py",
    conda:
        gammapy_env
    log:
        dl5 / "{analysis}/calc_excess_map.log",
    shell:
        """
        python {input.script} \
        --datasets-path {input.data} \
        --models-path {input.bkg_fit} \
        --output {output} \
        --log-file {log}
        """

rule plot_significance_map:
    output:
        dl5 / "{analysis}/plots/{significance}_map.pdf",
    input:
        lima_map=dl5 / "{analysis}/{significance}_map.pkl",
        script=scripts/"plot_significance_map.py",
        rc=MATPLOTLIBRC,
    conda:
        gammapy_env
    log:
        dl5 / "{analysis}/plots/plot_{significance}_map.log",
    resources:
        mem_mb=16000,
    shell:
        """
        MATPLOTLIBRC={input.rc} \
        python {input.script} \
        --flux-maps {input.lima_map} \
        --output {output} \
        --log-file {log}
        """

rule plot_significance_distribution:
    output:
        dl5 / "{analysis}/plots/{significance}_distribution.pdf",
    input:
        lima_map=dl5 / "{analysis}/ts_significance_map.pkl",
        script=scripts/"plot_significance_distribution.py",
        rc=MATPLOTLIBRC,
        exclusion_mask=dl4 / "{analysis}/bkg_exclusion.fits.gz"
    conda:
        gammapy_env
    resources:
        mem_mb=16000,
    log:
        dl5 / "{analysis}/plots/plot_{significance}_distribution.log",
    shell:
        """
        MATPLOTLIBRC={input.rc} \
        python {input.script} \
        --input-maps {input.lima_map} \
        --exclusion-mask {input.exclusion_mask} \
        --output {output} \
        --log-file {log}
        """

Flux Points

# Fit flux etc.
rule calc_flux_points:
    output:
        dl5/ "{analysis}/flux_points.fits.gz",
    input:
        data=dl4 / "{analysis}/datasets.fits.gz",
        model=dl5/ "{analysis}/model-best-fit.yaml",
        config=config_dir / "{analysis}/analysis.yaml",
        script=scripts/"calc_flux_points.py",
    conda:
        gammapy_env
    shell:
        """
        python {input.script} \
            -c {input.config} \
            --datasets-path {input.data} \
            --best-model-path {input.model} \
            -o {output}
        """

rule plot_flux_points:
    output:
        dl5 / "{analysis}/plots/flux_points.pdf",
    input:
        data=dl5/ "{analysis}/flux_points.fits.gz",
        model=dl5/ "{analysis}/model-best-fit.yaml",
        script=scripts/"plot_flux_points.py",
    conda:
        gammapy_env
    shell:
        """
        python {input.script} \
            -i {input.data} \
            --best-model-path {input.model} \
            -o {output}
        """

Light curve

rule calc_light_curve:
    input:
        model=dl5/ "{analysis}/model-best-fit.yaml",
        config=config_dir / "{analysis}/analysis.yaml",
        dataset=dl4/ "{analysis}/datasets.fits.gz",
        fit=dl4/"{analysis}/model-best-fit.yaml",
        script=scripts/"calc_light_curve.py",
    output:
        dl5/ "{analysis}/light_curve.fits.gz",
    conda:
        gammapy_env
    shell:
        """
        python {input.script} \
            -c {input.config} \
            --dataset-path {input.dataset} \
            --bkg-models-path {input.fit} \
            --best-model-path {input.model} \
            -o {output} \
        """

DM

J maps wie?

Define stuff

dm_env = ENVS["dark_matter"]
dm = Path(OUTDIRS["dm"])
dl4 = Path(OUTDIRS["dl4"])
scripts = Path(SCRIPTS["dm"])
channels = ["b", "tau", "W", "mu"] # configurable?

rule dm:
    input:
        ul_plots=expand(dm / f"{analysis}/plots/uls_{channel}.pdf", analysis=analyses, channel=channels)

Fit ULs

What model results should this get? I guess only the bkg fits? if more models (eg pointsource are to be fitted, they should be refitted. Could be added in config maybe?)
right now titrate does its own thing anyway, but I want to already set this up

rule gen_uls:
    output:
        dm / "{analysis}/uls_{channel}.fits.gz"
    input:
        dataset = dl4 / "{analysis}/datasets.fits.gz",
        dm_config = configs / "dm_config.yaml",
        script=scripts/"fit_uls.py",
        models_fit=dl4/"{analysis}/bkg_fit.yaml",
    conda:
        dm_env
    log:
        dm / "{analysis}/{channel}.log"
    shell:
        """
        python {input.script} \
        --dataset {input.dataset} \
        --config-path {input.dm_config} \
        --models-path {input.models_fit} \
        --output {output} \
        --log-file {log} \
        --channel {channel} \
        --verbose
        """

Plot ULs

rule plot_uls:
    output:
        dm / "{analysis}/plots/uls_{channel}.pdf"
    input:
        uls = dm / "{analysis}/uls_channel.fits.gz",
        script=scripts/"fit_uls.py",
    conda:
        dm_env
    log:
        dm / "{analysis}/plots/{channel}.log"
    shell:
        """
        python {input.script} \
        --dataset {input.dataset} \
        --output {output} \
        --log-file {log} \
        --verbose
        """

Files

analysis.org

Latest commit

History

analysis.org

File metadata and controls

Analysis

Missing things and bugs

Main Snakefile

Defining Paths etc

DL1

Define stuff

Select relevant runs based on runlist

Data quality checks

Define runs to be used for the analysis

Plots

MC

Variables

Link nodes

Create train and test files per node

Train models

Plot model performance

DL2

Definition

Create DL2

IRFs

DL3

Definitions

Create DL3 Data

Individual runs

Background

Index

Plot dl3 stuff

rates

irf_comparison

Theta 2

Background

Skymaps

Cuts dl3 (kinda irf)

plot dl3 irfs

DL4

Definitions

Create MapDataset

Plot DL4 statistics

DL5

Define stuff

2D Flux profile

Fit Skymodel(s)

Significance map

Flux Points

Light curve

DM

Define stuff

Fit ULs

Plot ULs