Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow PBS CI to create workers #54

Merged
merged 16 commits into from
May 22, 2020
Merged

Allow PBS CI to create workers #54

merged 16 commits into from
May 22, 2020

Conversation

ocaisa
Copy link
Contributor

@ocaisa ocaisa commented May 14, 2020

No description provided.

@ocaisa
Copy link
Contributor Author

ocaisa commented May 15, 2020

@AdamWlodarczyk In 1c17093 I wanted to clean up some warnings and it has now triggered a failing test that I cannot figure out

@ocaisa
Copy link
Contributor Author

ocaisa commented May 15, 2020

@AdamWlodarczyk I can fix the problem with 7d0fba9 but I really don't understand why that is not allowed. Not closing the local cluster seems to kick a lot of warnings

@AdamWlodarczyk
Copy link
Collaborator

@ocaisa this is strange. I don't understand it either. I will take a look at it on the weekend.

@AdamWlodarczyk
Copy link
Collaborator

AdamWlodarczyk commented May 19, 2020

@ocaisa, I'm still debugging that and this is super strange. Somehow multiple closing and therefore multiple local cluster creation influences the mpi task subprocess execution since it returns the returncode == 1.

@AdamWlodarczyk
Copy link
Collaborator

@ocaisa, news from the front. If you'd comment out the test_flush_and_abort test and test_mpi_deserialize_and_execute then remaining test (especially the test_mpi_wrap_execution where problems appeared) are passing. So the problem lays in those two. Probably. I'll keep on looking.

@AdamWlodarczyk
Copy link
Collaborator

set_task_mpi_comm and flush_and_abort functions from mpi_wrapper.py seem to make troubles.
Both have in common the import line from mpi4py import MPI.

@ocaisa
Copy link
Contributor Author

ocaisa commented May 20, 2020

@AdamWlodarczyk Ok, maybe there is some sense here, there may be some environment variables set by mpi4py that don't allow mpiexec to run. I'll play around with it a bit

@ocaisa
Copy link
Contributor Author

ocaisa commented May 20, 2020

I think I could get this to work if I could enforce an order, I'm trying to that but I'm failing

@ocaisa
Copy link
Contributor Author

ocaisa commented May 20, 2020

@AdamWlodarczyk Ok, I have something that works, there is a warning that I would like to ignore but I don't know how to do that.

@AdamWlodarczyk
Copy link
Collaborator

@ocaisa ftobia/pytest-ordering#57 mentions that warning. Well, let's count on them to fix it.

@AdamWlodarczyk
Copy link
Collaborator

Added just small change.

@ocaisa
Copy link
Contributor Author

ocaisa commented May 20, 2020

@AdamWlodarczyk This still needs a small amount of work, I tried to execute an MPI job on the slaves and it didn't work, I'll figure it out tomorrow.

@ocaisa
Copy link
Contributor Author

ocaisa commented May 22, 2020

@AdamWlodarczyk ok, good to go now

@AdamWlodarczyk AdamWlodarczyk merged commit a2d060b into master May 22, 2020
@ocaisa ocaisa deleted the pbs_ci branch January 15, 2021 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants