-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semop lock error during 2D Classification #738
Comments
@arom4github Can you investigate this? This is happening in the code you introduced in |
I just updated the original post that it's most likely related to "allow coarse sampling". |
Does this happen on all datasets (such as our tutorial dataset) or only on this particular dataset? |
Let me get that tutorial daatset and see if i can reproduce the issue as well. |
I tried up to Class2D after auto-pick with LoG and did not get those error msg. |
Did you process the tutorial dataset on the same machine using the same binary as the failing dataset? What if you extract the tutorial dataset to the same box size (in pixels) as the failing one? |
yes. same ubuntu and same relion. |
Both sound unlikely causes. Are particles centered well? (Are rlnOriginX/YAngst large?) |
No response for a while; if this still happens in 4.0, please reopen. |
I just encountered this in RELION 3.1.2 on Ubuntu 20.04. Adding some debug info to Line 201 in fa923df
fails errno is set to EINVAL as the semaphore with semid was removed whilst relion_refine_mpi is running. This blog post appears relevant but as @biochem-fan implies updating to RELION-4 will also fix the problem.
|
Do you know what/who deletes the semaphore in a wrong time? |
I think they are removed when the user logs out - that's what the man page for
The jobs where I saw these failures were submitted using SLURM but from the node (workstation) that the jobs were running on (i.e. not from a login node on a cluster) and I was logged on when the jobs started and had logged out before they failed. Presumably setting |
Hi,
I'm getting a weird stochastic "semop lock error" msg during the 2D classification.
Let me try my best to provide as much info as i can.
System setting:
2x Xeon total 64 threads with 252GB RAM coupled with 4x RTX 2080 Ti
OS:
ubuntu 18.04 LTS
Relion versions:
I tried both 3.1.1 and 3.1.1-commit-9f3bf1
Cuda version:
Tried both Cuda 11 and Cuda 9.2 (compiled relion separately)
Dataset:
Pixel size: 1.12, Voltage: 300, Cs: 2.7
Particle Box size:
300px but bin2
2D Class setting:
Optimization: 50 ~ 100, T: 2, Number of iteration: 25 ~ 30, mask diameter (A): 300
Sampling: alignment Yes, angular sampling:6, Offset search range: 10, offset search step: 1, allow coarse sampling: NO
Stochastically failed at random iteration but consistently failed at the end of the iteration and either beginning or somewhere in the middle of maximization step.
Here's one example of output and error:
I even tried with MPICH 3.4.1 instead of OpenMPI and still getting the same error at random.
Any suggestions to get through this would be highly appreciated.
Thank you.
best,
hee jong kim
The text was updated successfully, but these errors were encountered: