Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved file-open operation in BinaryRecordingSegment out of constructor #3296

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cheydrick
Copy link

I recently report issue #3295 (BinaryRecordingExtractor opens the binary file, but doesn't close it, preventing later deleting or modifying binary file).

This PR fixes the problem that I described in that issue. I don't think it has any unwanted side-effects, though hopefully someone more familiar with the bigger picture will point out if it does!

chris added 2 commits August 8, 2024 16:39
…) method, and uses with statment so that the file will close after it's no longer needed
@alejoe91 alejoe91 added the core Changes to core module label Aug 9, 2024
@h-mayorquin
Copy link
Collaborator

Sup, @cheydrick, hope you are doing well in Texas.

The behavior in your PR is what I wanted to implement but @samuelgarcia pointed out some possible complication that we have not had time to test/experiment.

The problem is that if the file is remote it might take a while to open it (this is what we want to do experiments with). If I remember well @samuelgarcia did some small experiments with its clusters and find some small effect in that direction. I still believe that this effect is a small price to pay for IO security but leaving the file handles open was the compromise we reached.

Can you test the solution that @alejoe91 proposed on the linked issue? (to del the recording).
If that does not work, we can provide a method that iterates over all the open files and closes them.

We should do more conclusive experiments to assess the cost of re-opening the file on every call when the files are remote but now that I think about it maybe we could solve this with a flag. We will have to wait till Sam returns from his vacations to discuss it.

@cheydrick
Copy link
Author

The suggestion that @alejoe91 made worked.

I had first tried setting recording = None thinking that would force the cleanup, but using del recording worked to release the files.

I closed the issue, but will wait for someone else to close this PR in case further discussion as you described should happen.

Thanks for the replying @alejoe91 and @h-mayorquin!

@cheydrick
Copy link
Author

I updated #3295 to note that running Kilosort 4 prevents the .bin file from being deleted, despite first forcing deletion of the recording object. This PR remedies that, too, though obviously it doesn't address the big picture as @h-mayorquin noted.

@paolahydra
Copy link

If that does not work, we can provide a method that iterates over all the open files and closes them.

Hi @h-mayorquin, I would be interested in this patch since I am having the same problem!
Thanks,
Paola

@samuelgarcia
Copy link
Member

Hi.
@h-mayorquin exactly. I am pretty sure that open/close file have a cost on network drive but I do not have number.
So I would not be in favor of this.

For me, when we do rec = read_something() then it is natural that the file is opened (ressource locked) and cannot be delete until del rec. Otherwise what would be the behavior of rec.get_trace() after deletion ?

@JoeZiminski
Copy link
Collaborator

JoeZiminski commented Sep 9, 2024

It is the case that every time a data chunk is accessed the file would need to be reopened and memory mapped? e.g. preprocessing a 1h recording in 2 second chunks, this would result in 1800 reopen-memory map-close cycles? I am not so familiar with the performance costs of this operation but it seems suboptimal if it is the case.

However I've also got confused trying to figure out how to close the file handle to the recording object in a similar situation. Can this be solved by documentation and API e.g. can we add a recording.close_file() method? in this way the file handle can be closed without deleting the entire recording object. Also, we can mention in the docs that extractors will keep the file handle open unless explicitly closed with close_file().

@zm711
Copy link
Collaborator

zm711 commented Sep 10, 2024

I think there is some work on the Neo side for changing how memmapping will work for recording objects, but it has been on the backburner for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants