Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEDS Spilling prematurely when unable to access S3 (MinIO) storage. #40

Open
ojundi03 opened this issue Jun 14, 2024 · 0 comments
Open

Comments

@ojundi03
Copy link

Describe the bug

  • This bug was encountered while using GEDS-HDFS as a tier-2 storage for Pravega.
  • While configured to spill to MInIO (S3), GEDS spills earlier than expected in the event of an MInIO outage. with the working directory set to a drive with 20GB of storage space the expected behaviour is that in the event of an MInIO outage, GEDS should fill up to ~70% of its capacity (~14GB), before throttling and errors are encountered.
  • In reality, only ~2.4GB is written to GEDS before throttling occurs.
  • In the logs, cURL error 7 (could not connect) and 28 (Timeout reached) are shown repeatedly. In particular, the first instance of error 28 aligns with when the throttling begins.
  • I believe GEDS may be able to last significantly longer while under an MInIO outage, and this is being hindered by some sort of timeout.

To Reproduce

  1. Follow the instructions 1.) and 2.) at https://github.com/cloudskin-eu/pravega-geds to achieve the Pravega-GEDS deployment.
  2. Run /setup-scripts/pravega-geds-install.sh to install the GEDS-integrated Pravega deployment on Kubernetes.
  3. Navigate to /experiment and run run-experiment.sh.
  4. Logs for the Pravega segment-store pod can be viewed through kubectl logs pravega-pravega-segmentstore-0. The error(s) should be visible in the logs.

Additional information

Configuration Used:
GEDS is configured using environment variables:

        options:
          pravegaservice.storage.layout: "CHUNKED_STORAGE"
          pravegaservice.storage.impl.name: "HDFS"
          hdfs.connect.uri: "hdfs://tier-2-geds"
          hdfs.fs.impl: "com.ibm.geds.hdfs.GEDSHadoopFileSystem"
        env:
          GEDS_METADATASERVER: "geds-metadataserver:4381"
          GEDS_LOCAL_STORAGE_PATH: "/tmp/pravega/cache"
          AWS_ACCESS_KEY_ID: "miniostorage"
          AWS_SECRET_ACCESS_KEY: "miniostorage"
          AWS_ENDPOINT_URL: "http://minio.pravega.svc.cluster.local:80"
          GEDS_CONFIGURE_S3_USING_ENV: "1"

Curl Code 7

java.util.concurrent.CompletionException: io.pravega.segmentstore.storage.chunklayer.ChunkStorageException: checkExists
	at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:751)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at io.pravega.common.concurrent.ThreadPoolScheduledExecutorService$ScheduledRunnable.run(ThreadPoolScheduledExecutorService.java:209)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.pravega.segmentstore.storage.chunklayer.ChunkStorageException: checkExists
	at io.pravega.storage.hdfs.HDFSChunkStorage.convertException(HDFSChunkStorage.java:367)
	at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:169)
	at io.pravega.segmentstore.storage.chunklayer.BaseChunkStorage.lambda$checkExistsAsync$3(BaseChunkStorage.java:89)
	at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:747)
	... 6 common frames omitted
Caused by: java.io.IOException: Unable to file status: _system/containers/_sysjournal.container4.snapshot_info: curlCode: 7, Couldn't connect to server
	at com.ibm.geds.GEDS.nativeStatus(Native Method)
	at com.ibm.geds.GEDS.status(GEDS.java:260)
	at com.ibm.geds.hdfs.GEDSHadoopFileSystem.getFileStatus(GEDSHadoopFileSystem.java:154)
	at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:164)
	... 8 common frames omitted
_system/containers/_sysjournal.container4.snapshot_info

Curl Code 28

	at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:751)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at io.pravega.common.concurrent.ThreadPoolScheduledExecutorService$ScheduledRunnable.run(ThreadPoolScheduledExecutorService.java:209)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.pravega.segmentstore.storage.chunklayer.ChunkStorageException: checkExists
	at io.pravega.storage.hdfs.HDFSChunkStorage.convertException(HDFSChunkStorage.java:367)
	at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:169)
	at io.pravega.segmentstore.storage.chunklayer.BaseChunkStorage.lambda$checkExistsAsync$3(BaseChunkStorage.java:89)
	at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:747)
	... 6 common frames omitted
Caused by: java.io.IOException: Unable to file status: _system/containers/_sysjournal.container7.snapshot_info: curlCode: 28, Timeout was reached
	at com.ibm.geds.GEDS.nativeStatus(Native Method)
	at com.ibm.geds.GEDS.status(GEDS.java:260)
	at com.ibm.geds.hdfs.GEDSHadoopFileSystem.getFileStatus(GEDSHadoopFileSystem.java:154)
	at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:164)
	... 8 common frames omitted
_system/containers/_sysjournal.container7.snapshot_info```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant