Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in what variables are available to exit-script comapred to Cylc7 #6360

Open
ColemanTom opened this issue Sep 6, 2024 · 4 comments
Open
Labels
question Flag this as a question for the next Cylc project meeting.
Milestone

Comments

@ColemanTom
Copy link
Contributor

Description

In Cylc7, variables defined in various script sections were available to exit-script.
In Cylc8, variables defined in various script sections are not available to exit-script even if they are exported. For example, pre-script defined variables are not available to exit-script.

I've not seen this change in behaviour documented, but maybe I missed it. If it is unexpected, then this is a bug.

NOTE: I've only tested with exit-script, but my guess is similar behaviour applies to err-script.

Reproducible Example

In Cylc7, confirming that variables in pre-script do not need to be exported to be seen in the exit-script:

[scheduling]
    initial cycle point = 2020
    final cycle point = 2020
    [[dependencies]]
        [[[R1]]]
            graph = a
[runtime]
    [[a]]
        script = true
        pre-script = """export GF=BAR
                        ASD=FOO"""
        post-script = echo post: $FOO $GF $ASD
        exit-script = echo exit: $FOO $GF $ASD
        [[[environment]]]
            FOO = bar

Gives log output below. You can see, variables exported or not exported are all available to exit-script.

$ cat log/job/20200101T0000Z/a/NN/job.out
Suite    : basic
Task Job : 20200101T0000Z/a/01 (try 1)
User@Host:

post: bar BAR FOO
2024-09-06T01:06:09Z INFO - started
2024-09-06T01:06:09Z INFO - succeeded
exit: bar BAR FOO

Compare to Cylc8 version:

[scheduling]
    initial cycle point = 2020
    final cycle point = 2020
    [[graph]]
        R1 = a

[runtime]
    [[a]]
        script = true
        pre-script = """export GF=BAR
                        ASD=FOO"""
        post-script = echo post: $FOO $GF $ASD
        exit-script = echo exit: $FOO $GF $ASD
        [[[environment]]]
            FOO = bar

And the logs - you can see the first variable, which is defined in the [[[environment]]] section, can't even be resolved.

$ cat ~/cylc-run/basic-workflow/runN/log/job/20200101T0000Z/a/NN/job.out
Workflow : basic-workflow/run1
Job : 20200101T0000Z/a/01 (try 1)
User@Host:

post: bar BAR FOO
2024-09-06T01:10:48Z INFO - started
2024-09-06T01:10:50Z INFO - succeeded
$ cat ~/cylc-run/basic-workflow/runN/log/job/20200101T0000Z/a/NN/job.err
$HOME/cylc-run/basic-workflow/run1/log/job/20200101T0000Z/a/01/job: line 65: FOO: unbound variable

Expected Behaviour

Consistent behaviour with Cylc7 - variables defined in any section should be available to exit-script.

@ColemanTom ColemanTom added the bug Something is wrong :( label Sep 6, 2024
@oliver-sanders
Copy link
Member

I've not seen this change in behaviour documented, but maybe I missed it. If it is unexpected, then this is a bug.

Looks like we haven't documented this particular consequence, but this is expected.

The env-script/pre-script/script/post-script are run in a subshell, this isolates them from the job script itself which protects the cylc message command against environment alterations and is better for signal catching:

# Env-Script, User Environment, Pre-Script, Script and Post-Script
# Run user scripts in subshell to protect cylc job script from interference.
# Waiting on background process allows signal traps to trigger immediately.
cylc__job__run_user_scripts &
CYLC_TASK_USER_SCRIPT_PID=$!
wait "${CYLC_TASK_USER_SCRIPT_PID}" || {
# Check return code for signals (value greater than 128).
typeset ret_code="$?"
if ((ret_code > 128)); then
# Trigger the EXIT trap if the process exited due to a signal.
exit "$ret_code"
else
# Trigger ERR trap while preserving the exit code
# (NB: Bash versions are buggy and neither return statement nor
# subshelled exit won't do here.)
cylc__set_return "$ret_code"
fi
}

The err-script is run after errors have been caught or signals trapped in the original shell (because we can't get back into the subshell):

if "${run_err_script}"; then
cylc__job__run_inst_func 'err_script' "${signal}" >&2
fi

I think this ensures that err-script is called more reliably when signals are involved but mandates a separate shell, hence no environment inheritance.

Workaround: If you are trying to diagnose a particular command failure, you can work around this by putting the logic into the script section:

script = my-command || ( diagnosis-script && exit 1 )

Question: Is there anything we can reasonably do to facilitate this sort of use case? I guess we could potentially env > job.env in the subshell, then source job.env in the err-script, but this might not work reliably when signals are involved?

@oliver-sanders oliver-sanders added question Flag this as a question for the next Cylc project meeting. and removed bug Something is wrong :( labels Sep 6, 2024
@oliver-sanders oliver-sanders added this to the 8.x milestone Sep 6, 2024
@oliver-sanders
Copy link
Member

@ColemanTom
Copy link
Contributor Author

Ok. Interesting, although I have not read it all.

The use case in this specific one can be done in other ways. The original Cylc7 job was

  1. Run a job (script)
  2. If it succeeds, backup data to a failover disk (exit-script)

The variables to do the failover backup include things like ROSE_DATAC which is defined in previous sections.

It can be done in script but it was setup so script was consistent for all tasks, and changing it is just extra stuff to duplicate until the ability to to something like inherit += comes around (I know there is an Issue for that which I'm not going to lookup to link in). It can be moved into a post-script, which is what the develop has done for now. However, is there a way in the post-script to check the status and only do something on success? I think there is, but I can't remember.

@oliver-sanders
Copy link
Member

is there a way in the post-script to check the status and only do something on success

The post-script won't be run if the script fails?

The variables to do the failover backup include things like ROSE_DATAC which is defined in previous sections.

If it's only the derived Rose variables that are required you can do this:

pre-script = eval $(rose task-env)
script = my-script
exit-script = """
    eval $(rose task-env)
    backup_data
"""

The use case in this specific one can be done in other ways

  1. Target the specific part of the script your logic concerns
pre-script = eval $(rose task-env)
script = """
    setup
    myscript && backup_data
    post
"""
  1. Do it in a post-script
pre-script = eval $(rose task-env)
script = myscript
post-script = backup_data
err-script = recover
  1. Do it in the graph
[scheduling]
    [[graph]]
        R1 = """
            my_script? => backup_data
            my_script:fail? => recovery_task
        """

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Flag this as a question for the next Cylc project meeting.
Projects
None yet
Development

No branches or pull requests

2 participants