Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take action to stop the temporary directory being deleted due to lack of use #31732

Closed
droberts195 opened this issue Jul 2, 2018 · 5 comments · Fixed by #32615
Closed

Take action to stop the temporary directory being deleted due to lack of use #31732

droberts195 opened this issue Jul 2, 2018 · 5 comments · Fixed by #32615
Labels
>bug :Core/Infra/Core Core issues without another label :ml Machine learning

Comments

@droberts195
Copy link
Contributor

Since #27609 Elasticsearch defaults to a per-run temporary directory. On Linux this is a directory created by the startup script under /tmp.

Linux systemd has functionality that can remove files and directories from /tmp that have not been used for a certain length of time. This functionality is described in man tmpfiles.d.

RHEL and CentOS 7 ship with a configuration for this functionality in /usr/lib/tmpfiles.d/tmp.conf that deletes files and directories that have not been touched for 10 days:

v /tmp 1777 root root 10d

(Note: If you read the man page you might think this doesn't delete old files, as the man page says:

  The age field only applies to lines starting with d, D, and x. If omitted or set to "-", no automatic clean-up is done.

However, the man page is wrong. Cleanup by age also applies to other configuration entries, including v. There are 7 letters it applies to in the code: https://github.com/systemd/systemd/blob/2479c4fe3fc3d0b631b93debbc2a83aa40a5f379/src/tmpfiles/tmpfiles.c#L1904

v is CREATE_SUBVOLUME in that switch.)

Currently the only part of Elasticsearch that uses java.io.tmpdir more than a few seconds after startup is ML. As a result, if someone does not start an ML job on a particular node that is running on RHEL or CentOS 7 then 10 days after ES startup the temporary directory is removed by tmpfiles.d functionality. If an ML job is run on the node after this then it fails because the temporary directory does not exist.

Due to security manager the ES JVM cannot recreate the temporary directory. Therefore the best solution would seem to be to periodically create and remove a file in the temporary directory. If we created and removed a file every 22 hours then this would keep the directory modification time within the last day, even for days when daylight saving time starting reduces the day length to 23 hours. So this would keep the directory alive even for a user who configured tmpfiles.d to clean after 1 day.

Since ML is currently the only affected component this periodic touching of the temporary directory could be done in the ML code. However, this problem could also affect 3rd party plugins that use java.io.tmpdir, so it would be nicer if the functionality to keep the temporary directory alive was in core Elasticsearch. The ML team can implement it if we can get some advice on the best place in the code to put it.

@droberts195 droberts195 added >bug :Core/Infra/Core Core issues without another label :ml Machine learning labels Jul 2, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@MorrieAtElastic
Copy link
Contributor

Can I get a verification as to what version this is fixed in? Thanks

@davidkyle
Copy link
Member

@MorrieAtElastic The fix hasn't been made yet, the work around is to manually recreate the missing temp directory

@droberts195
Copy link
Contributor Author

@MorrieAtElastic this is now scheduled to be fixed in 6.4.0 by #32615.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label :ml Machine learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants