Change in what variables are available to exit-script comapred to Cylc7 #6360

ColemanTom · 2024-09-06T01:13:54Z

Description

In Cylc7, variables defined in various script sections were available to exit-script.
In Cylc8, variables defined in various script sections are not available to exit-script even if they are exported. For example, pre-script defined variables are not available to exit-script.

I've not seen this change in behaviour documented, but maybe I missed it. If it is unexpected, then this is a bug.

NOTE: I've only tested with exit-script, but my guess is similar behaviour applies to err-script.

Reproducible Example

In Cylc7, confirming that variables in pre-script do not need to be exported to be seen in the exit-script:

[scheduling]
    initial cycle point = 2020
    final cycle point = 2020
    [[dependencies]]
        [[[R1]]]
            graph = a
[runtime]
    [[a]]
        script = true
        pre-script = """export GF=BAR
                        ASD=FOO"""
        post-script = echo post: $FOO $GF $ASD
        exit-script = echo exit: $FOO $GF $ASD
        [[[environment]]]
            FOO = bar

Gives log output below. You can see, variables exported or not exported are all available to exit-script.

$ cat log/job/20200101T0000Z/a/NN/job.out
Suite    : basic
Task Job : 20200101T0000Z/a/01 (try 1)
User@Host:

post: bar BAR FOO
2024-09-06T01:06:09Z INFO - started
2024-09-06T01:06:09Z INFO - succeeded
exit: bar BAR FOO

Compare to Cylc8 version:

[scheduling]
    initial cycle point = 2020
    final cycle point = 2020
    [[graph]]
        R1 = a

[runtime]
    [[a]]
        script = true
        pre-script = """export GF=BAR
                        ASD=FOO"""
        post-script = echo post: $FOO $GF $ASD
        exit-script = echo exit: $FOO $GF $ASD
        [[[environment]]]
            FOO = bar

And the logs - you can see the first variable, which is defined in the [[[environment]]] section, can't even be resolved.

$ cat ~/cylc-run/basic-workflow/runN/log/job/20200101T0000Z/a/NN/job.out
Workflow : basic-workflow/run1
Job : 20200101T0000Z/a/01 (try 1)
User@Host:

post: bar BAR FOO
2024-09-06T01:10:48Z INFO - started
2024-09-06T01:10:50Z INFO - succeeded
$ cat ~/cylc-run/basic-workflow/runN/log/job/20200101T0000Z/a/NN/job.err
$HOME/cylc-run/basic-workflow/run1/log/job/20200101T0000Z/a/01/job: line 65: FOO: unbound variable

Expected Behaviour

Consistent behaviour with Cylc7 - variables defined in any section should be available to exit-script.

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2024-09-06T09:01:42Z

I've not seen this change in behaviour documented, but maybe I missed it. If it is unexpected, then this is a bug.

Looks like we haven't documented this particular consequence, but this is expected.

The env-script/pre-script/script/post-script are run in a subshell, this isolates them from the job script itself which protects the cylc message command against environment alterations and is better for signal catching:

cylc-flow/cylc/flow/etc/job.sh

Lines 141 to 158 in 293c3cc

    
           # Env-Script, User Environment, Pre-Script, Script and Post-Script 
        
           # Run user scripts in subshell to protect cylc job script from interference. 
        
           # Waiting on background process allows signal traps to trigger immediately. 
        
           cylc__job__run_user_scripts & 
        
           CYLC_TASK_USER_SCRIPT_PID=$! 
        
           wait "${CYLC_TASK_USER_SCRIPT_PID}" || { 
        
               # Check return code for signals (value greater than 128). 
        
               typeset ret_code="$?" 
        
               if ((ret_code > 128)); then 
        
                   # Trigger the EXIT trap if the process exited due to a signal. 
        
                   exit "$ret_code" 
        
               else 
        
                   # Trigger ERR trap while preserving the exit code 
        
                   # (NB: Bash versions are buggy and neither return statement nor 
        
                   # subshelled exit won't do here.) 
        
                   cylc__set_return "$ret_code" 
        
               fi 
        
           }

The err-script is run after errors have been caught or signals trapped in the original shell (because we can't get back into the subshell):

cylc-flow/cylc/flow/etc/job.sh

Lines 283 to 285 in 293c3cc

    
           if "${run_err_script}"; then 
        
               cylc__job__run_inst_func 'err_script' "${signal}" >&2 
        
           fi

I think this ensures that err-script is called more reliably when signals are involved but mandates a separate shell, hence no environment inheritance.

Workaround: If you are trying to diagnose a particular command failure, you can work around this by putting the logic into the script section:

script = my-command || ( diagnosis-script && exit 1 )

Question: Is there anything we can reasonably do to facilitate this sort of use case? I guess we could potentially env > job.env in the subshell, then source job.env in the err-script, but this might not work reliably when signals are involved?

oliver-sanders · 2024-09-06T09:04:45Z

Further reading:

ColemanTom · 2024-09-08T22:54:08Z

Ok. Interesting, although I have not read it all.

The use case in this specific one can be done in other ways. The original Cylc7 job was

Run a job (script)
If it succeeds, backup data to a failover disk (exit-script)

The variables to do the failover backup include things like ROSE_DATAC which is defined in previous sections.

It can be done in script but it was setup so script was consistent for all tasks, and changing it is just extra stuff to duplicate until the ability to to something like inherit += comes around (I know there is an Issue for that which I'm not going to lookup to link in). It can be moved into a post-script, which is what the develop has done for now. However, is there a way in the post-script to check the status and only do something on success? I think there is, but I can't remember.

oliver-sanders · 2024-09-10T08:47:14Z

is there a way in the post-script to check the status and only do something on success

The post-script won't be run if the script fails?

The variables to do the failover backup include things like ROSE_DATAC which is defined in previous sections.

If it's only the derived Rose variables that are required you can do this:

pre-script = eval $(rose task-env)
script = my-script
exit-script = """
    eval $(rose task-env)
    backup_data
"""

The use case in this specific one can be done in other ways

Target the specific part of the script your logic concerns

pre-script = eval $(rose task-env)
script = """
    setup
    myscript && backup_data
    post
"""

Do it in a post-script

pre-script = eval $(rose task-env)
script = myscript
post-script = backup_data
err-script = recover

Do it in the graph

[scheduling]
    [[graph]]
        R1 = """
            my_script? => backup_data
            my_script:fail? => recovery_task
        """

ColemanTom added the bug Something is wrong :( label Sep 6, 2024

oliver-sanders added question Flag this as a question for the next Cylc project meeting. and removed bug Something is wrong :( labels Sep 6, 2024

oliver-sanders added this to the 8.x milestone Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change in what variables are available to exit-script comapred to Cylc7 #6360

Change in what variables are available to exit-script comapred to Cylc7 #6360

ColemanTom commented Sep 6, 2024

oliver-sanders commented Sep 6, 2024

oliver-sanders commented Sep 6, 2024

ColemanTom commented Sep 8, 2024

oliver-sanders commented Sep 10, 2024

Change in what variables are available to exit-script comapred to Cylc7 #6360

Change in what variables are available to exit-script comapred to Cylc7 #6360

Comments

ColemanTom commented Sep 6, 2024

Description

Reproducible Example

Expected Behaviour

oliver-sanders commented Sep 6, 2024

oliver-sanders commented Sep 6, 2024

ColemanTom commented Sep 8, 2024

oliver-sanders commented Sep 10, 2024