Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcast - GraphQLLocatedError: dictionary changed size during iteration #6222

Closed
ColemanTom opened this issue Jul 11, 2024 · 16 comments
Closed
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@ColemanTom
Copy link
Contributor

ColemanTom commented Jul 11, 2024

Description

This is in CYLC_VERSION=8.3.2

2024-07-11T04:09:24Z INFO - Command "kill_tasks" actioned with 1 warnings. ID=baf57597-47c1-4ff5-a8a2-39875731e8e7
2024-07-11T04:09:25Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20230201T0000Z'], mode=put_broadcast, namespaces=['CLEAN_AND_RESET_002'], settings=[{'environment': {'FHRS': '000 012 024 036 042 048'}}])
2024-07-11T04:09:25Z INFO - Broadcast set:
    + [20230201T0000Z/CLEAN_AND_RESET_002] [environment]FHRS=000 012 024 036 042 048
2024-07-11T04:09:25Z INFO - [20230201T0000Z/archive_um_012_008/01:running] => succeeded
Traceback (most recent call last):
  File "miniconda3/envs/cylc-8.3.2/lib/python3.11/site-packages/promise/promise.py", line 844, in handle_future_result
    resolve(future.result())
            ^^^^^^^^^^^^^^^
  File "miniconda3/envs/cylc-8.3.2/lib/python3.11/site-packages/cylc/flow/network/graphql.py", line 433, in async_resolve
    return await next_(root, info, **args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "miniconda3/envs/cylc-8.3.2/lib/python3.11/site-packages/promise/iterate_promise.py", line 10, in iterate_promise
    yield from promise.future  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^
graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration

2024-07-11T04:09:25Z ERROR - dictionary changed size during iteration
2024-07-11T04:09:25Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration

It appears to be a race condition with a dictionary not being threadsafe.

Reproducible Example

I've not had time to explore this in great detail. I am roughly doing

  • pause
  • kill B (B is a family within A)
  • broadcast to C (C is not related to A or B) <---- this is where the error happens
  • sleep 10
  • hold A
  • remove B (B is a family within A)
  • sleep 10
  • play
  • trigger new flow from a specific task
  • broadcast D (D is a family within A) to mimic skip mode (noting there are no xtriggers relating to D)
    e.g.
    cylc broadcast \
        -p "$CYLC_TASK_CYCLE_POINT" \
        -n "$namespace" \
        -s 'platform=localhost' \
        -s 'script=true' \
        -s 'pre-script=' \
        -s 'post-script=' \
        -s 'err-script=' \
        -s 'exit-script=' \
        "$CYLC_WORKFLOW_NAME"
  • broadcast to C to remove the old broadcast
  • release A

Expected Behaviour

No failure should be seen.

@ColemanTom ColemanTom added the bug Something is wrong :( label Jul 11, 2024
@ColemanTom
Copy link
Contributor Author

That last broadcast to D isn't actually needed, so I'm going to remove that, but there is still some race condition at play here I think.

@oliver-sanders oliver-sanders added this to the 8.3.3 milestone Jul 11, 2024
@oliver-sanders
Copy link
Member

Unfortunately the promise library is rather good at hiding the origin of the actual error (which is not line 10 in iterate_promise.py) making this tricky to debug.

I suspect this isn't a new bug in 8.3.2 but something that's been lurking for a while that's hard to activate. The best thing I can think to do is to hammer a workflow with the commands above until one of them fails. If we managed to replicate it in this way, then we can start subtracting commands until we have a minimal reproducible example. We can then single step the logic from within the scheduler to locate the point of breakage.

@ColemanTom
Copy link
Contributor Author

I think I may know the cause of me triggering this. I'll try to find time to create a simple test workflow to help. It was exposed on my end because I accidentally made an infinite task loop following the above steps.

@elliotfontaine
Copy link

Hi! Just wanted to say I encountered it too. Looks like a race condition to me, since the broadcasting task succeeded on a retry.
It never happened to me before updating to 8.3.x, this is the first time. I use in-task broadcasting every day since february.

[runtime]
    [[_catch_raw]]
        script = """
            cylc broadcast "${CYLC_WORKFLOW_ID}" \
                -p "${CYLC_TASK_CYCLE_POINT}" \
                -s "[environment]RAWFILE_PATH=${catch_raw_file}"

            cylc broadcast "${CYLC_WORKFLOW_ID}" \
                -p "${CYLC_TASK_CYCLE_POINT}" \
                -s "[environment]RAWFILE_STEM=$(basename "$catch_raw_file" .raw)"
        """
        [[[meta]]]
            title = Catch Raw
            description = """
                This helper task follows the `catch_raw` external trigger, and propagates raw file
                path and stem to downstream tasks.
            """

@oliver-sanders
Copy link
Member

I've tried to reproduce this problem by performing multiple broadcasts in a task, and running that task in parallel. So far I've not managed to replicate the bug. There's probably some other factor involved.

@wxtim wxtim modified the milestones: 8.3.3, 8.3.4 Jul 23, 2024
@MetRonnie MetRonnie changed the title graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration Broadcast - GraphQLLocatedError: dictionary changed size during iteration Aug 9, 2024
@oliver-sanders
Copy link
Member

oliver-sanders commented Aug 30, 2024

Sadly, I just cannot reproduce this one. Here's my latest attempt:

  • Run a workflow.
  • Set up a script to broadcast to it every second.
  • Set up a script to trigger tasks every 5 seconds.
  • Set up a script to remove a task every 15 seconds.
  • Set up a script to reload every 30 seconds.
  • Set up a script to restart every 45 seconds.
  • Leave running overnight.

I've also been scanning for dictionary changed size during iteration in workflow logs at our site but haven't managed to find anything.

Unfortunately, it doesn't look like there's anything we can do about this. If you are still experiencing the issue, please let us know and drop any context that might help here. I doubt it will reveal much, but running workflows in --debug mode might possibly help.

The traceback reported is not actually coming from the Cylc code, it's coming from the "promise" package which is a dependency of the GraphQL tools that we use. We will need to refresh our Python GraphQL toolchain soon which will remove this dependency, so if it is an issue in the underlying library, we should be rid of it then.

@elliotfontaine
Copy link

Not that big of a deal honestly, especially as automated retries mitigate it 👍

@oliver-sanders oliver-sanders modified the milestones: 8.3.4, 8.3.5 Sep 11, 2024
@oliver-sanders oliver-sanders added needs reproducing A bug report that does not yet have a reproducible example and removed investigation labels Sep 26, 2024
@oliver-sanders oliver-sanders modified the milestones: 8.3.5, 8.3.x Oct 4, 2024
@ColemanTom
Copy link
Contributor Author

ColemanTom commented Oct 7, 2024

I just hit this 3 times in succession (30 second retries, 3 times in a row and then the task gave up retrying and failed). Only a broadcast was used. Using Cylc-8.3.4

+0S:L4+ cylc broadcast -p 20241006T1200Z -n WAIT_TASKS_EXCEPT_FIRST_ONE_004 -s platform=localhost -s script=true -s pre-script= -s post-script= -s err-script= -s exit-script= access_g4_pp_grp11
ERROR: [{'error': {'message': 'dictionary changed size during iteration', 'traceback': ['graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration\n']}}]
[FAIL] umwait-${CMD_VERSION}.sh $ROSE_TASK_SUFFIX $CHECK_ME # return-code=1

Context - there would have been lots of tasks running at once, maybe broadcasts from different sources in parallel.

Looking in the logs, I've seen it a few times but this is the first time its happened multiple times in a row for the same task. From the logs I can grep out this:

/05-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/05-start-01.log:2024-10-06T13:18:58Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:2024-10-06T18:37:08Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/08-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/08-start-01.log:2024-10-06T23:35:38Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:2024-10-07T04:35:09Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:2024-10-07T04:35:11Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:2024-10-07T04:35:52Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:2024-10-07T04:36:33Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/02-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/02-start-01.log:2024-10-06T08:38:36Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/04-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/04-start-01.log:2024-10-06T13:19:21Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/04-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/04-start-01.log:2024-10-06T13:19:58Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:2024-10-06T18:37:11Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:2024-10-06T18:37:23Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:2024-10-06T18:38:02Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/08-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/08-start-01.log:2024-10-06T23:35:54Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/09-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/09-start-01.log:2024-10-07T04:35:04Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/03-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/03-start-01.log:2024-10-06T05:59:13Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/03-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/03-start-01.log:2024-10-06T05:59:16Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/10-start-01.log:2024-10-07T01:53:14Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/06-start-01.log:2024-10-06T15:56:18Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/09-start-01.log:graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
/09-start-01.log:2024-10-07T01:53:12Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration

In this particular case, here is the grep for broadcast\|graphql

2024-10-07T04:34:44Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_005'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:34:47Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_007'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:07Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_003'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:08Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_008'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:09Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_002'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:35:09Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:35:10Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_009'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:10Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_004'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:35:11Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:35:21Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_006'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:32Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_001'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:49Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_002'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-07T04:35:51Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_004'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:35:52Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:36:33Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T1200Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_004'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-07T04:36:33Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration

And a different, once-off case

2024-10-06T23:35:03Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_017'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:35:17Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_014'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:35:22Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_015'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:35:26Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_016'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:35:26Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_013'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:35:46Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_011'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:35:54Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_012'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-06T23:35:54Z ERROR - graphql.error.located_error.GraphQLLocatedError: dictionary changed size during iteration
2024-10-06T23:35:56Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_010'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])
2024-10-06T23:36:33Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20241006T0000Z'], mode=put_broadcast, namespaces=['WAIT_TASKS_EXCEPT_FIRST_ONE_012'], settings=[{'platform': 'localhost'}, {'script': 'true'}, {'pre-script': ''}, {'post-script': ''}, {'err-script': ''}, {'exit-script': ''}])

I'm not running in debug mode and would prefer not to if I can avoid it. If there is anything else to grep from the logs I can do that though.

My best guess is, broadcasts from a set of tasks running around the same time OR broadcasts running whilst lots of other tasks are running, inserting new ones and removing old ones from the N-window.

@oliver-sanders
Copy link
Member

Thanks for the report. I don't think there's going to be much more info you can glean from the logs (with or without debug mode) in this case.

I just need to find a way to reproduce this locally. I'll try scaling up parallel broadcasts as far as I can, see if that does it.

@ColemanTom
Copy link
Contributor Author

One suggestion in case you aren't, use remote platforms for the broadcast. Maybe the extra latency causes an issue?

@oliver-sanders oliver-sanders removed the needs reproducing A bug report that does not yet have a reproducible example label Oct 8, 2024
@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 8, 2024

Replicated!!!

I had to push the scaling really, really far to encounter the issue (probably why I had failed to replicate it before).

This example seems to reliably reproduce the issue within ~60 seconds. It's running ~50,000 broadcasts in parallel! I put the tasks onto an external job runner (don't try running these locally, they will take out your box) but I used a local job runner:

[task parameters]
    x = 1..100
    [[templates]]
        x = x_%(x)03d

[scheduling]
    initial cycle point = 1
    cycling mode = integer
    [[graph]]
        P1 = """
            <x>[-P1] => <x>
        """

[runtime]
    [[<x>]]
        script = """
            for x in $(seq 1 50); do
                cylc broadcast "${CYLC_WORKFLOW_ID}" \
                    -n "x_$(printf '%03d' "$x")" \
                    -p "$(( CYLC_TASK_CYCLE_POINT + 1 ))" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s "[environment]x_${RANDOM}=${RANDOM}" \
                    -s 'pre-script=true' \
                    -s 'env-script=true' \
                    -s 'post-script=' \
                    -s 'exit-script=' \
                    -s 'err-script=' &
            done
            wait
            sleep $(( RANDOM % 3 ))
        """

@oliver-sanders oliver-sanders self-assigned this Oct 8, 2024
@oliver-sanders oliver-sanders modified the milestones: 8.3.x, 8.3.5 Oct 8, 2024
@oliver-sanders
Copy link
Member

Simple fix, will put this into 8.3.5.

oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Oct 8, 2024
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Oct 9, 2024
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Oct 9, 2024
@ColemanTom
Copy link
Contributor Author

I believe this can be closed now?

@hjoliver
Copy link
Member

Closed by #6397

@ColemanTom
Copy link
Contributor Author

@hjoliver and @oliver-sanders - any chance I could convince you to put out 8.3.5 for this fix? The bug is hitting me somewhat regularly in my large workflows and I'm wanting to start routine distribution of data to downstream users for UAT.

@hjoliver
Copy link
Member

Yes we were looking to release 8.3.5 soon. How're you placed for that at the UK end @oliver-sanders ? (I'm out of time for today but can check the milestone status tomorrow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

5 participants