Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPM update to 7.5.1 prevents start of elasticsearch #50631

Closed
erempel opened this issue Jan 4, 2020 · 17 comments · Fixed by #51827
Closed

RPM update to 7.5.1 prevents start of elasticsearch #50631

erempel opened this issue Jan 4, 2020 · 17 comments · Fixed by #51827
Assignees
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team

Comments

@erempel
Copy link

erempel commented Jan 4, 2020

Platform: Redhat EL 7.7 (systemd)
Previous ES package: elasticsearch-7.4.2-1.x86_64
Upgraded ES package: elasticsearch-7.5.1-1.x86_64

ES forum post at
https://discuss.elastic.co/t/elasticsearch-7-5-1-rpm-update-prevents-restart-of-service/213636

made me look into things more.

After the upgrade and host reboot, the ES service would not start and gave the error

Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: org.elasticsearch.cli.UserException: unable to create temporary keystore at [/etc/elasticsearch/elasticsearch.keystore.tmp], write permissions required for [/etc/elasticsearch] or run [elasticsearch-keystore upgrade]

I can confirm that there is a "posttrans scriptlet" in the RPM used used during the upgrade and that the scriptlet containes the commands to perform the keystore upgrade, and that this scriptet was run at the time of the update (we upgrade automatically starting at 04:00)

% ls -al /etc/elasticsearch/.elasticsearch.keystore.initial_md5sum
-rw-r--r-- 1 root elasticsearch 0 Jan 1 04:34 .elasticsearch.keystore.initial_md5sum

Adding group write permissions to the /etc/elasticsearch directory permitted the service to start, which create or rewrote the keystore file.

% ls -al /etc/elasticsearch/elasticsearch.keystore
-rw-rw---- 1 elasticsearch elasticsearch 199 Jan 2 14:57 elasticsearch.keystore

Two things I note

  1. The posttrans scriptlet sets the owner:group of the keystore file to root:elasticsearch however, after the service started the file is owned by elasticsearch:elasticsearch

  2. The md5sum output file is of size 0, which indicates to me that the posttrans scriptlet failed to create the initial keystore file, or perhaps ran the "else" clause of the posttrans scriptlet which only created the md5sum file.

@romseygeek romseygeek added the :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts label Jan 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Packaging)

@williamrandolph williamrandolph self-assigned this Jan 8, 2020
@williamrandolph
Copy link
Contributor

I'll look into this and raise the issue with the dev team if necessary.

@williamrandolph
Copy link
Contributor

I'm having a tough time reproducing this in a clean environment. In particular, I can't see how the file .elasticsearch.keystore.initial_md5sum would have a size zero.

The relevant script is posttrans:

if [ ! -f /etc/elasticsearch/elasticsearch.keystore ]; then
    /usr/share/elasticsearch/bin/elasticsearch-keystore create
    chown root:elasticsearch /etc/elasticsearch/elasticsearch.keystore
    chmod 660 /etc/elasticsearch/elasticsearch.keystore
    md5sum /etc/elasticsearch/elasticsearch.keystore > /etc/elasticsearch/.elasticsearch.keystore.initial_md5sum
else
    /usr/share/elasticsearch/bin/elasticsearch-keystore upgrade
fi

If the file exists and md5sum can read it, md5sum will be able to calculate a checksum. We'd get a size of 0 if md5sum hit an error reading /etc/elasticsearch/elasticsearch.keystore, but from the lines just above, I can't see how that would happen.

Are you running your RPM commands as root, or do you have a different permissions scheme in place? Are you using the keystore for secure settings for your cluster?

@erempel
Copy link
Author

erempel commented Jan 13, 2020

I am equally confused by the zero size of the md5sum file. The update was done with a standard root initiated "yum update".

% ls -l /var/log/yum.log
-rw------- 1 root root 113867 Jan 8 12:05 /var/log/yum.log

% sudo fgrep elasticsearch /var/log/yum.log
Nov 26 10:05:37 Installed: elasticsearch-7.4.2-1.x86_64
Jan 01 04:40:51 Updated: elasticsearch-7.5.1-1.x86_64

only root can write to the yum log, and the update was recorded by the yum update process.

I also agree that the relevant script is posttrans

The only way that the md5sum could be zero is if the keystore file did not exist and the command

md5sum /etc/elasticsearch/elasticsearch.keystore > /etc/elasticsearch/.elasticsearch.keystore.initial_md5sum

was run.

@williamrandolph
Copy link
Contributor

I've tried a number of different combinations of keystore modifications and Elasticsearch upgrades, and I'm unable to reproduce this behavior on an RHEL 7 Vagrant image.

Are you able to find a series of steps to reproduce the problem?

@Daych
Copy link

Daych commented Jan 16, 2020

Jan 16 11:45:53 test206 systemd[1]: Starting Elasticsearch...
-- Subject: Unit elasticsearch.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

-- Unit elasticsearch.service has begun starting up.
Jan 16 11:45:54 test206 elasticsearch[10611]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Jan 16 11:45:55 test206 elasticsearch[10611]: Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: org.elasticsearch.cli.UserException: unable to create temporary keystore at [/etc/elasticsearch/elasticsearch.keystore.tmp], write permissions required for [/etc/elasticsearch] or run [elasticsearch-keystore upgrade]
Jan 16 11:45:55 test206 elasticsearch[10611]: Likely root cause: java.nio.file.AccessDeniedException: /etc/elasticsearch/elasticsearch.keystore.tmp
Jan 16 11:45:55 test206 elasticsearch[10611]: at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
Jan 16 11:45:55 test206 elasticsearch[10611]: at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
Jan 16 11:45:55 test206 elasticsearch[10611]: at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
Jan 16 11:45:55 test206 elasticsearch[10611]: at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
Jan 16 11:45:55 test206 elasticsearch[10611]: at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478)
Jan 16 11:45:55 test206 elasticsearch[10611]: at java.base/java.nio.file.Files.newOutputStream(Files.java:223)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:410)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:406)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:254)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.common.settings.KeyStoreWrapper.save(KeyStoreWrapper.java:484)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.bootstrap.Bootstrap.loadSecureSettings(Bootstrap.java:242)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:305)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.cli.Command.main(Command.java:90)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115)
Jan 16 11:45:55 test206 elasticsearch[10611]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
Jan 16 11:45:55 test206 elasticsearch[10611]: Refer to the log for complete error details.
Jan 16 11:45:55 test206 systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
Jan 16 11:45:55 test206 systemd[1]: Failed to start Elasticsearch.
-- Subject: Unit elasticsearch.service has failed

@Daych
Copy link

Daych commented Jan 16, 2020

I got the same issue that installed es 7.5.1 by "yum install elasticsearch" when starting the service

@williamrandolph
Copy link
Contributor

@Daych Was this a fresh installation, or was it an upgrade from a previous version of Elasticsearch?

Assuming Elasticsearch has been installed to /usr/share/elasticsearch, does the command sudo /usr/share/elasticsearch/bin/elasticsearch-keystore upgrade fix the problem for you?

@Daych
Copy link

Daych commented Jan 17, 2020

@williamrandolph Hi William, I had installed the elasticsearch 6.8.5 before, then I just remove all the file related to the elasticsearch before I install the 7.5.1 version

@williamrandolph
Copy link
Contributor

This issue is continuing to come up in various environments, and I still haven't traced down why.

The intention is for the RPM scripts to handle all of the keystore setup so that the Elasticsearch application will not try to do any writing in the configuration directory. Our work in this direction fell under #28928 and #41755.

@williamrandolph
Copy link
Contributor

A summary of where we are right now:

The intended workaround for this problem is running sudo /usr/share/elasticsearch/bin/elasticsearch-keystore upgrade and then restart Elasticsearch, but I would still like to understand how the RPM upgrade failure is happening in the first place.

When you're running Elasticsearch as a systemd service, your keystore needs to be set up correctly before you start the Elasticsearch service. It needs to be in the latest keystore format, and it needs to have a value for keystore.seed. The elasticsearch-keystore upgrade command will take care of both of these things if you run it before starting Elasticsearch. Otherwise, the Elasticsearch service will try to upgrade the keystore on startup, and will likely fail because we have deliberately restricted its write permissions during runtime.

Thus, it's easy to create a situation where you get a unable to create temporary keystore error on startup by either (1) deleting the keystore after upgrading the RPM but before starting Elasticsearch or (2) removing the keystore.seed value from the keystore. In either case, the fix is to run sudo /usr/share/elasticsearch/bin/elasticsearch-keystore upgrade.

When we upgrade the RPM, we intend to have the posttrans scriptlet take care of the keystore situation. This scriptlet should be the very last scriptlet to run during the upgrade process If the keystore exists as a regular file, then we run the upgrade command; if it doesn't exist, we create it.

In the case above where the .elasticsearch.keystore.initial_md5sum had size 0, there likely would have been an error displayed in the stdout of the yum upgrade process. Unfortunately, it doesn't seem like that error would be logged anywhere under a default setup. I managed to force this outcome by removing md5sum from the root shell path. Although this change didn't replicate the error, it did create error output in the yum stdout:

Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : elasticsearch-7.5.1-1.x86_64 
/var/tmp/rpm-tmp.1sqM6w: line 75: md5sum: command not found  # <-- this is the error message
  Cleanup    : elasticsearch-7.4.2-1.x86_64
  Verifying  : elasticsearch-7.5.1-1.x86_64
  Verifying  : elasticsearch-7.4.2-1.x86_64

Updated:
  elasticsearch.x86_64 0:7.5.1-1

If anyone runs into this problem and is able to view output from the yum upgrade command, there might be helpful information there.

Since one way the posttrans script could fail and leave a md5sum file with size 0 is if the posttrans script doesn't have the md5sum utility on its path, on a system where this upgrade fails, I'd be curious to know the output of sudo sh -c 'echo %PATH' and which md5sum.

I would like to fix this problem for our users, but I need enough information to reproduce it first.

@erempel
Copy link
Author

erempel commented Jan 31, 2020

reinstalling the same 7.5.1 RPM I get the following issue that prevents the posttrans from running correctly.

Running transaction
Installing : elasticsearch-7.5.1-1.x86_64 1/1
ES_PATH_CONF must be set to the configuration path
warning: %posttrans(elasticsearch-0:7.5.1-1.x86_64) scriptlet failed, exit status 1
Non-fatal POSTTRANS scriptlet failure in rpm package elasticsearch-7.5.1-1.x86_64

@williamrandolph
Copy link
Contributor

@erempel Ah, that's very interesting. The ES_PATH_CONF value is supposed to be set in the /etc/sysconfig/elasticsearch script, which gets loaded when posttrans runs elasticsearch-keystore. Does your /etc/sysconfig/elasticsearch file have the line ES_PATH_CONF=/etc/elasticsearch?

@erempel
Copy link
Author

erempel commented Jan 31, 2020

No, it does not.
I guess that is an oversight on our configuration of elasticsearch. We were getting ready to abandon that in favor of the systemd unit.service file configuration structures (/etc/systemd/system/elasticsearch.conf.d/XXX)

I thought that we had included all of the un-commented settings from the original elasticsearch.conf file and set any settings in our ansible deploy tasks.

I guess I will be adding this one back in.

@williamrandolph
Copy link
Contributor

@erempel I'm able to reproduce the error now. I hope that adding ES_PATH_CONF=/etc/elasticsearch to /etc/sysconfig/elasticsearch can be a workaround in the short term. But I think it's strange and certainly not obvious that this setting needs to be in place for RPM upgrades to work correctly.

I'm going to see if I can put out a fix so that if an error like this happens in the scriptlet, the RPM upgrade will fail with a reasonable message, and to make sure that ES_PATH_CONF gets a default setting during RPM upgrades.

Thank you for reporting this issue and helping me figure out how to reproduce it.

@williamrandolph
Copy link
Contributor

It looks like we've already done something to address this problem on the development branch. See #50158 and #50246.

@jamshid
Copy link

jamshid commented Oct 19, 2020

FWIW I ran into similar problems because a elasticsearch 5.6 version of /etc/sysconfig/elasticsearch (which does not have ES_PATH_CONF) was placed into an elasticsearch 6.8 install. Upgrading to 7.5.2 reported a lot of errors and I had to manually fix the keystore.

@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants