Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [remote store] Error in uploading file - Too many open files #9689

Closed
tlfeng opened this issue Sep 1, 2023 · 1 comment
Closed

[BUG] [remote store] Error in uploading file - Too many open files #9689

tlfeng opened this issue Sep 1, 2023 · 1 comment
Assignees
Labels
bug Something isn't working Storage:Remote Storage Issues and PRs relating to data and metadata storage

Comments

@tlfeng
Copy link
Collaborator

tlfeng commented Sep 1, 2023

Describe the bug
Found during running performance test #8874
Only shows in one cluster on Aug.31st

[2023-08-31T18:06:23,785][ERROR][o.o.i.t.t.BlobStoreTransferService] [ip-10-0-5-251.ec2.internal] Failed to upload blob translog-130893.ckp

java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/24/translog/translog-130893.ckp: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:114) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.lambda$uploadBlobs$2(BlobStoreTransferService.java:98) [opensearch-2.10.0.jar:2.10.0]
	at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlobs(BlobStoreTransferService.java:93) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:131) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:268) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:241) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:191) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$43(IndexShard.java:4130) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4144) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

[2023-08-31T18:06:23,785][ERROR][o.o.i.t.t.TranslogTransferManager] [ip-10-0-5-251.ec2.internal] [logs-2400gb][24] Exception during transfer for file translog-130893.ckp

org.opensearch.index.translog.transfer.FileTransferException: java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/24/translog/translog-130893.ckp: Too many open files
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:145) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.lambda$uploadBlobs$2(BlobStoreTransferService.java:98) [opensearch-2.10.0.jar:2.10.0]
	at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlobs(BlobStoreTransferService.java:93) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:131) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:268) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:241) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:191) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$43(IndexShard.java:4130) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4144) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

Caused by: java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/24/translog/translog-130893.ckp: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:114) ~[opensearch-2.10.0.jar:2.10.0]
	... 19 more

[2023-08-31T18:06:23,878][ERROR][o.o.i.t.t.TranslogTransferManager] [ip-10-0-5-251.ec2.internal] [logs-2400gb][24] Transfer failed for snapshot TranslogTransferSnapshot [ primary term = 1, generation = 130893 ]

java.io.IOException: Failed to upload 1 files during transfer
	at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:149) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:268) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:241) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:191) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$43(IndexShard.java:4130) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4144) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
	Suppressed: org.opensearch.index.translog.transfer.FileTransferException: java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/24/translog/translog-130893.ckp: Too many open files
		at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:145) ~[opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.transfer.BlobStoreTransferService.lambda$uploadBlobs$2(BlobStoreTransferService.java:98) ~[opensearch-2.10.0.jar:2.10.0]
		at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
		at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlobs(BlobStoreTransferService.java:93) ~[opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:131) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:268) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:241) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:191) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$43(IndexShard.java:4130) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4144) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
		at java.lang.Thread.run(Thread.java:833) [?:?]
	Caused by: java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/24/translog/translog-130893.ckp: Too many open files
		at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
		at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
		at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
		at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181) ~[?:?]
		at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
		at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
		at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:114) ~[opensearch-2.10.0.jar:2.10.0]
		... 19 more

[2023-08-31T18:06:25,301][WARN ][o.o.i.e.Engine           ] [ip-10-0-5-251.ec2.internal] [logs-2400gb][6] failed to close engine

java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/6/index: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:298) ~[?:?]
	at java.nio.channels.FileChannel.open(FileChannel.java:357) ~[?:?]
	at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:465) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FSDirectory.syncMetaData(FSDirectory.java:279) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FilterDirectory.syncMetaData(FilterDirectory.java:96) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FilterDirectory.syncMetaData(FilterDirectory.java:96) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:907) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.SegmentInfos.commit(SegmentInfos.java:974) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.index.store.Store.commitSegmentInfos(Store.java:880) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.NRTReplicationEngine.commitSegmentInfos(NRTReplicationEngine.java:182) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.NRTReplicationEngine.closeNoLock(NRTReplicationEngine.java:482) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.Engine.failEngine(Engine.java:1317) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.failShard(IndexShard.java:1767) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.storeStats(IndexShard.java:1375) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:195) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.indexShardStats(IndicesService.java:648) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.statsByShard(IndicesService.java:602) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.stats(IndicesService.java:593) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.node.NodeService.stats(NodeService.java:227) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:105) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:56) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:328) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:324) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:454) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

[2023-08-31T18:06:25,302][WARN ][o.o.i.e.Engine           ] [ip-10-0-5-251.ec2.internal] [logs-2400gb][6] failed engine [Failing shard because of exception during storeStats]

java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/6/index: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:440) ~[?:?]
	at java.nio.file.Files.newDirectoryStream(Files.java:482) ~[?:?]
	at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:180) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:199) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.index.store.ByteSizeCachingDirectory.estimateSizeInBytes(ByteSizeCachingDirectory.java:76) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.ByteSizeCachingDirectory$1.refresh(ByteSizeCachingDirectory.java:113) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.ByteSizeCachingDirectory$1.refresh(ByteSizeCachingDirectory.java:95) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:68) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.ByteSizeCachingDirectory.estimateSizeInBytes(ByteSizeCachingDirectory.java:144) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.Store$StoreDirectory.estimateSize(Store.java:910) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.Store.stats(Store.java:476) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.storeStats(IndexShard.java:1373) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:195) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.indexShardStats(IndicesService.java:648) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.statsByShard(IndicesService.java:602) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.stats(IndicesService.java:593) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.node.NodeService.stats(NodeService.java:227) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:105) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:56) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:328) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:324) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:454) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

[2023-08-31T18:06:25,303][WARN ][o.o.i.c.IndicesClusterStateService] [ip-10-0-5-251.ec2.internal] [logs-2400gb][6] marking and sending shard failed due to [shard failure, reason [Failing shard because of exception during storeStats]]

java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/indices/kIVAEMeBTsKkPn6JPT4c3g/6/index: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:440) ~[?:?]
	at java.nio.file.Files.newDirectoryStream(Files.java:482) ~[?:?]
	at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:180) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:199) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.index.store.ByteSizeCachingDirectory.estimateSizeInBytes(ByteSizeCachingDirectory.java:76) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.ByteSizeCachingDirectory$1.refresh(ByteSizeCachingDirectory.java:113) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.ByteSizeCachingDirectory$1.refresh(ByteSizeCachingDirectory.java:95) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:68) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.ByteSizeCachingDirectory.estimateSizeInBytes(ByteSizeCachingDirectory.java:144) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.Store$StoreDirectory.estimateSize(Store.java:910) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.Store.stats(Store.java:476) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.storeStats(IndexShard.java:1373) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:195) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.indexShardStats(IndicesService.java:648) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.statsByShard(IndicesService.java:602) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.IndicesService.stats(IndicesService.java:593) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.node.NodeService.stats(NodeService.java:227) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:105) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:56) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:328) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:324) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:454) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

[2023-08-31T18:06:25,311][ERROR][o.o.g.G.AsyncLucenePersistedState] [ip-10-0-5-251.ec2.internal] Exception occurred when storing new meta data

org.opensearch.OpenSearchException: java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/_state/_1f.fdm: Too many open files
	at org.opensearch.ExceptionsHelper.convertToRuntime(ExceptionsHelper.java:84) ~[opensearch-core-2.10.0.jar:2.10.0]
	at org.opensearch.gateway.GatewayMetaState$LucenePersistedState.handleExceptionOnWrite(GatewayMetaState.java:595) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.gateway.GatewayMetaState$LucenePersistedState.setLastAcceptedState(GatewayMetaState.java:564) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.gateway.GatewayMetaState$AsyncLucenePersistedState$1.doRun(GatewayMetaState.java:443) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

Caused by: java.nio.file.FileSystemException: /home/ec2-user/opensearch/data/nodes/0/_state/_1f.fdm: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) ~[?:?]
	at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:484) ~[?:?]
	at java.nio.file.Files.newOutputStream(Files.java:228) ~[?:?]
	at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:394) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:387) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:220) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:41) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:126) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsWriter(Lucene90StoredFieldsFormat.java:154) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:512) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:543) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:242) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1545) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1830) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.gateway.PersistedClusterStateService$MetadataIndexWriter.updateIndexMetadataDocument(PersistedClusterStateService.java:554) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.gateway.PersistedClusterStateService$Writer.updateMetadata(PersistedClusterStateService.java:769) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.gateway.PersistedClusterStateService$Writer.writeIncrementalStateAndCommit(PersistedClusterStateService.java:690) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.gateway.GatewayMetaState$LucenePersistedState.setLastAcceptedState(GatewayMetaState.java:560) ~[opensearch-2.10.0.jar:2.10.0]
	... 6 more

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@tlfeng tlfeng added bug Something isn't working untriaged labels Sep 1, 2023
@tlfeng tlfeng changed the title [BUG] [remote storage] Error in uploading file - Too many open files [BUG] [remote store] Error in uploading file - Too many open files Sep 1, 2023
@tlfeng tlfeng added Storage Issues and PRs relating to data and metadata storage Indexing:Replication Issues and PRs related to core replication framework eg segrep labels Sep 5, 2023
@kotwanikunal kotwanikunal added the Indexing Indexing, Bulk Indexing and anything related to indexing label Sep 19, 2023
@anasalkouz anasalkouz added Storage:Remote and removed Indexing Indexing, Bulk Indexing and anything related to indexing Indexing:Replication Issues and PRs related to core replication framework eg segrep labels Sep 21, 2023
@ashking94
Copy link
Member

There have been multiple fixes on performances. We are also trimming translog files after each refresh. We have run multiple performance tests and have not seen this issue since couple of months. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Remote Storage Issues and PRs relating to data and metadata storage
Projects
Status: ✅ Done
Development

No branches or pull requests

5 participants