ES Hang after reinstall (Have been discussed in the discuss.elastic.co) #21902

tianchao-haohan · 2016-12-01T03:41:26Z

https://discuss.elastic.co/t/elasticsearch-hang-after-reinstall/67425

Elasticsearch version: 5.0.0-alpha1

Plugins installed: []

JVM version:1.8.0_60

OS version:SUSE Enterprise 12

Description of the problem including expected versus actual behavior:
remove 5.0.0-alpha1 with rpm -ev. then install elasticsearch-2.4.1. Failed to start elasticsearch

Steps to reproduce:

Install elasticsearch-5.0.0-alpha1
uninstall elasticsearch-5.0.0-alpha1
install elasticsearch-2.4.1
start elasticsearch with sudo systemctl start elasticsearch
ps -ef | grep elastic
elastic+ 6400 1 0 03:17 pts/1 00:00:00 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.4.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -d -p /var/run/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch
/bin/netstat -nap | grep 9200, nothing could be found
Failed to "curl -XGET localhost:9200", network host in elasticsearch.yml has been configured to 0.0.0.0

Provide logs (if relevant): No logs under /var/log/elasticsearch
Describe the feature:
jstack the pid:
"Signal Dispatcher" #5 daemon prio=9 os_prio=0 tid=0x00007fcfc0172000 nid=0x4dd0 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (Concurrent GC)" #4 daemon prio=9 os_prio=0 tid=0x00007fcfc0170800 nid=0x4dcf waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fcfc0134000 nid=0x4dce in Object.wait() [0x00007fcf734fb000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)

waiting on <0x00000000c00070b8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
locked <0x00000000c00070b8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fcfc0132000 nid=0x4dcd in Object.wait() [0x00007fcf735fc000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)

waiting on <0x00000000c0006af8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
locked <0x00000000c0006af8> (a java.lang.ref.Reference$Lock)

"main" #1 prio=5 os_prio=0 tid=0x00007fcfc0009800 nid=0x4db9 runnable [0x00007fcfc7955000]
java.lang.Thread.State: RUNNABLE
at sun.nio.fs.UnixNativeDispatcher.stat0(Native Method)
at sun.nio.fs.UnixNativeDispatcher.stat(UnixNativeDispatcher.java:286)
at sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:70)
at sun.nio.fs.UnixFileStore.devFor(UnixFileStore.java:55)
at sun.nio.fs.UnixFileStore.(UnixFileStore.java:70)
at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:48)
at sun.nio.fs.LinuxFileSystem.getFileStore(LinuxFileSystem.java:112)
at sun.nio.fs.UnixFileSystem$FileStoreIterator.readNext(UnixFileSystem.java:213)
at sun.nio.fs.UnixFileSystem$FileStoreIterator.hasNext(UnixFileSystem.java:224)

locked <0x00000000c0b74e48> (a sun.nio.fs.UnixFileSystem$FileStoreIterator)
at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:515)
at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:459)
at org.apache.lucene.util.IOUtils.spins(IOUtils.java:448)
at org.elasticsearch.env.ESFileStore.(ESFileStore.java:57)
at org.elasticsearch.env.Environment.(Environment.java:90)
at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:81)
at org.elasticsearch.common.cli.CliTool.(CliTool.java:107)
at org.elasticsearch.common.cli.CliTool.(CliTool.java:100)

I'm sure that I have removed all the related elasticsearch files.
And I can't find any other java process with ps -ef.
I tried es 5.x also and it has the same hang problem.
And I also tried to start es directly with this command:
"/usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.4.2.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.conf=/etc/elasticsearch"
The es process hang again.
There is no logs created:
c4dev@si-portal-server:> ls /var/log/elasticsearch/
c4dev@si-portal-server:>

jasontedor · 2016-12-18T17:06:14Z

This:

"main" #1 prio=5 os_prio=0 tid=0x00007fcfc0009800 nid=0x4db9 runnable [0x00007fcfc7955000]
java.lang.Thread.State: RUNNABLE
at sun.nio.fs.UnixNativeDispatcher.stat0(Native Method)
at sun.nio.fs.UnixNativeDispatcher.stat(UnixNativeDispatcher.java:286)
at sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:70)
at sun.nio.fs.UnixFileStore.devFor(UnixFileStore.java:55)
at sun.nio.fs.UnixFileStore.(UnixFileStore.java:70)
at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:48)
at sun.nio.fs.LinuxFileSystem.getFileStore(LinuxFileSystem.java:112)
at sun.nio.fs.UnixFileSystem$FileStoreIterator.readNext(UnixFileSystem.java:213)
at sun.nio.fs.UnixFileSystem$FileStoreIterator.hasNext(UnixFileSystem.java:224)
- locked <0x00000000c0b74e48> (a sun.nio.fs.UnixFileSystem$FileStoreIterator)
at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:515)
at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:459)
at org.apache.lucene.util.IOUtils.spins(IOUtils.java:448)
at org.elasticsearch.env.ESFileStore.(ESFileStore.java:57)
at org.elasticsearch.env.Environment.(Environment.java:90)
at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:81)
at org.elasticsearch.common.cli.CliTool.(CliTool.java:107)
at org.elasticsearch.common.cli.CliTool.(CliTool.java:100)

What filesystem and disks are you on?

ANorwell · 2016-12-27T16:26:12Z

I seem to be getting this as well in the java ES client with a very large ZFS installation (many disks and thousands of datasets). Preferably the ES client (nor the server) should not care about these or need to iterate over them.

ANorwell · 2016-12-27T18:01:14Z

To add some more detail here:

The initialization of org.elasticsearch.env.Environment is quadratic in the number of FileStores returned by java.nio.file.FileSystems.getDefault.getFileStores. For me, this can be 30k+ filestores, because the underlying file system is ZFS.

The quadratic performance comes because:

The static initialization of org.elasticsearch.env.Environment loops over all filestores, creating an ESFileStore object.
The initialization of ESFileStore invokes org.apache.lucene.util.IOUtils.spins, which calls IOUtils.getFileStore, which itself loops over every filestore to find a matching filestore for the provided path. (Additionally, the initialization scans /proc/self/mountinfo for the matching entry. This is also quadratic in the number of mounted filestores. In my case, most filestores are not mounted, so mountinfo has only a few entries.)

Some possible inefficiencies I see here:

The filestore is already known. Either IOUtils could expose providing this as part of its interface, or if it's hard to change lucene, then possibly the spins heuristic could be re-implemented in Elasticsearch, since it is really just a few lines of code.
/proc/self/mountinfo might be read only once, and preprocessed to provide a mountpoint => device number lookup.
Ideally the java elasticsearch client should not care about what disks exist. This seems specific to the server.
Is it really necessary for the server to know every filestore that exists? In some environments the vast majority are unrelated to ES data storage.

colings86 · 2017-03-31T14:34:42Z

@jasontedor is this still an issue?

colings86 · 2017-03-31T14:35:09Z

@ANorwell are you still seeing this problem on the released versions of 5.x?

jasontedor · 2017-03-31T14:35:59Z

No additional feedback, closing.

jasontedor · 2017-04-29T13:12:48Z

@ANorwell See #24402.

jasontedor added the feedback_needed label Dec 18, 2016

colings86 added the :Core/Infra/Core Core issues without another label label Mar 21, 2017

jasontedor closed this as completed Mar 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES Hang after reinstall (Have been discussed in the discuss.elastic.co) #21902

ES Hang after reinstall (Have been discussed in the discuss.elastic.co) #21902

tianchao-haohan commented Dec 1, 2016

jasontedor commented Dec 18, 2016

ANorwell commented Dec 27, 2016

ANorwell commented Dec 27, 2016 •

edited

Loading

colings86 commented Mar 31, 2017

colings86 commented Mar 31, 2017

jasontedor commented Mar 31, 2017

jasontedor commented Apr 29, 2017

ES Hang after reinstall (Have been discussed in the discuss.elastic.co) #21902

ES Hang after reinstall (Have been discussed in the discuss.elastic.co) #21902

Comments

tianchao-haohan commented Dec 1, 2016

jasontedor commented Dec 18, 2016

ANorwell commented Dec 27, 2016

ANorwell commented Dec 27, 2016 • edited Loading

colings86 commented Mar 31, 2017

colings86 commented Mar 31, 2017

jasontedor commented Mar 31, 2017

jasontedor commented Apr 29, 2017

ANorwell commented Dec 27, 2016 •

edited

Loading