-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container cache download #3163
base: dev
Are you sure you want to change the base?
Container cache download #3163
Conversation
… and are then copied to the cache if needed This allows getting rid of the confusing `output_path` variable.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files☔ View full report in Codecov by Sentry. |
593ffef
to
6293dd0
Compare
The code is there, but I still need to add unit-tests and documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your contribution, and sorry for the delayed review. I was on holiday. I think, your general approach is suitable, but there are unfortunately a few details that I am not yet happy with.
@@ -1086,12 +1086,13 @@ def get_singularity_images(self, current_revision: str = "") -> None: | |||
|
|||
# Organise containers based on what we need to do with them | |||
containers_exist: List[str] = [] | |||
containers_cache: List[Tuple[str, str, Optional[str]]] = [] | |||
containers_cache: List[Tuple[str, str, str]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the cache is never optional, so the type List[Tuple[str, str, Optional[str]]]
can be simplified to List[Tuple[str, str, str]]
for containers_cache.
My rationale was that I wanted an identical type for containers_cache
, containers_download
and containers_pull
to somewhat standardize the functions consuming them. But with the new containers_library
variable, the heterogeneity is anyway given, so I have no real objections against.
Ultimately, this comment is just to explain why I initially had done it differently.
return (out_path, cache_path) | ||
library_path = None | ||
if os.environ.get("NXF_SINGULARITY_LIBRARYDIR"): | ||
library_path = os.path.join(os.environ["NXF_SINGULARITY_LIBRARYDIR"], out_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will not work. The registries are stripped from the out_name
and without symlinks, I doubt that there will be a single appropriately named container in the NXF_SINGULARITY_LIBRARYDIR
.
You will have to use the name prior to trimming the registries.
"""Copy Singularity image from NXF_SINGULARITY_LIBRARYDIR to target folder, and possibly NXF_SINGULARITY_CACHEDIR.""" | ||
self.singularity_copy_image(container, library_path, out_path) | ||
if cache_path: | ||
self.singularity_copy_image(container, library_path, cache_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not used the $NXF_SINGULARITY_LIBRARYDIR
so far, so I struggle to conceptualize corresponding setups. Intuitively, however, I question that copying the image to the cache is desirable? That will probably lead to a lot of data duplication?
Also mind that in the case of self.container_cache_utilisation == "amend"
, the cache_path
is assigned to the out_path
in singularity_image_filenames()
function. So if you retain this copy step, you should at least not copy the same image twice to cache, but making the self.singularity_copy_image(container, library_path, cache_path)
conditional depending on the chosen cache utilisation.
"""Copy Singularity image between folders. This function is used seamlessly | ||
across the target directory, NXF_SINGULARITY_CACHEDIR, and NXF_SINGULARITY_LIBRARYDIR.""" | ||
log.debug(f"Copying {container} to cache: '{os.path.basename(from_path)}'") | ||
shutil.copyfile(from_path, to_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but this function misses important functionality. If you factor out the copying process, you need to consider that copies may be interrupted by exceptions or by the user using SIGINT (CTRL+C).
Previously, in case of incomplete/corrupted downloads, the local files were deleted by the except
and finally
branches within the singularity_download_image()
function. (Lines 1356-1369 on dev). Additionally, it did not matter too much, since the copy happened from the cache to the output_path
, typically a folder the user would delete in case of download failures.
But now that you changed the logic, it seems more likely to me that a user could amass corrupted images in the cache folder persistently. Therefore, it is important to ensure that the copy process removes partially copied files etc. in case of exceptions or SIGINT, e.g. with a cleanup_temp_files()
function.
Something along this line:
import signal
import sys
# Example
def cleanup_temp_files():
if os.path.exists('temp_file'):
os.remove('temp_file')
# Define a signal handler for SIGINT (CTRL+C)
def abort_download(sig, frame):
cleanup_temp_files()
raise DownloadError("Aborting pipeline download due to user interruption.")
signal.signal(signal.SIGINT, abort_download)
# Example of using try-finally for cleanup in case of exceptions
try:
# File copy code here
# ...
except Exception as e:
cleanup_temp_files()
raise DownloadError(e) from e
finally:
cleanup_temp_files()
log.debug(f"Copying {container} to cache: '{os.path.basename(from_path)}'") | ||
shutil.copyfile(from_path, to_path) | ||
# Create symlinks to ensure that the images are found even with different registries being used. | ||
self.symlink_singularity_images(to_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind that the cleanup function should also cover the symlinks in the cache.
@@ -1361,8 +1389,8 @@ def singularity_download_image( | |||
log.debug(f"Deleting incompleted singularity image download:\n'{output_path_tmp}'") | |||
if output_path_tmp and os.path.exists(output_path_tmp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cleanup function I mentioned. Now, it is not sufficient anymore, because the cache is not considered.
progress.update(task, description="Copying from cache to target directory") | ||
shutil.copyfile(cache_path, out_path) | ||
progress.update(task, description="Copying from target directory to cache") | ||
self.singularity_copy_image(container, out_path, cache_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to check now, if the cache_path actually exists on the file system:
os.path.exists(cache_path)
.
Previously, the get_singularity_images()
function ensured, that an image was actually present in cache before adding it to the list of images that are already cached.
if cache_path and os.path.exists(cache_path):
containers_cache.append((container, out_path, cache_path))
Hence, it was safe to copy it without further checks. Now, the cache_path
is just created from the environment variable without further checks if the defined directory actually exists in the file system and is writable. Both needs to be done either here just for the cache path or preferably inside the generic copy function - better safe than sorry.
Fixes #3019 and #3162.
First #3162. The code has to deal with an
out_path
(always defined) and sometimes acache_path
too. The implementation was slightly convoluted as it was doing fresh downloads into thecache_path
orout_path
(first decision point) and then doing an extra copy if needed (second decision point). Because of that confusion, symlinks across container registries were not all created across both locations.I propose to reverse the logic to make it more straightforward:
out_path
and create its symlinks.cache_path
(and create symlinks there).Then, #3019: I propose to handle the singularity "library" directory this way:
--container-library-utilisation
parameter for the library because i)remote
would be redundant with the--container-cache-utilisation
'sremote
mode, ii)amend
is not possible as per the read-only rule, so iii)copy
is the only possible mode.singularity pull
. WhenNXF_SINGULARITY_LIBRARYDIR
is set and the container exists in the library, it is copied to the target directories (out_path
and possiblycache_path
too)PR checklist
CHANGELOG.md
is updatedREADME.md