Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ensemble run directories don't get copied over to HPC #3019

Closed
Aariq opened this issue Aug 26, 2022 · 0 comments · Fixed by #3025
Closed

Some ensemble run directories don't get copied over to HPC #3019

Aariq opened this issue Aug 26, 2022 · 0 comments · Fixed by #3025

Comments

@Aariq
Copy link
Collaborator

Aariq commented Aug 26, 2022

Bug Description

When running an ED2 model, start_model_runs() sometimes fails to copy some ensembles in the run directory to a remote host (HPC). One possible reason for this this is that rsync is currently being run inside of a for-loop and maybe there are some limits to how many connections to the server are open or how often connections can be made. It'll be more efficient to just rsync all the ensemble files over at once outside of a for-loop anyways, even if it doesn't fix this bug.

It's either happening here:

PEcAn.remote::remote.copy.to(
host = settings$host,
src = file.path(settings$rundir, run_id_string),
dst = settings$host$rundir,
delete = TRUE)
}

Or maybe here (can't remember)

out <- PEcAn.remote::start_qsub(
run = run,
qsub_string = settings$host$qsub,
rundir = settings$rundir,
host = settings$host,
host_rundir = settings$host$rundir,
host_outdir = settings$host$outdir,
stdout_log = "stdout.log",
stderr_log = "stderr.log",
job_script = "job.sh")

To Reproduce

difficult to reproduce, sorry.

Expected behavior

All files for ensemble runs should be copied over and if they can't be, there should be an informative warning or error.

Additional context

Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant