-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executer memory issue on CDSE #595
Comments
@mbuchhorn executor-cores needs to be set to 1, now it will try to run 2 tasks in the same executor, so you roughly need twice the memory |
Ok, the jobs fail in the final task, where it seems that all data is loaded in a single task on one executor. I'll try running it myself with adjusted settings. |
the last merge_cubes process was basically setting a partitioner that could explain this OOM, added a potential workaround, to be rolled out still. |
It seems like the fix worked, I now have 16 tasks at the end instead of one, only thing I still require is a bit more driver memory, as I was running with default options. |
Job worked, cost is down to 76 credits thanks to achieving higher parallelism and lower memory use! The key thing are these two extra parameter in the context of apply_dimenstion target='bands':
Next step is to make this work without those parameters, after validating that the output is still correct.
|
I have a processing line which runs fine on the Terrascope backend. But unfortunately the same processing line is not running on the CDSE backend. I always get an executer memory issues.
I raise already several times the executer-memory and/or executer-overhead setting but still the same issue. On Terrascope I can run it with 3G executor memory and 2G overhead. On CDSE I have already 2.5 times of this amount and it still fails.
last job id: j-2311213ddcc94063ae7f28b03dad3b3e
Job-settings for CDSE (last test):
OPENEO_EXTRACT_JOB_OPTIONS = {
"driver-memory": "4G",
"driver-memoryOverhead": "8G",
"driver-cores": "2",
"executor-memory": "4G",
"executor-memoryOverhead": "8G",
"executor-cores": "2",
"max-executors": "50",
"soft-errors": "true"
}
job-id's which also failed:
j-2311175da37e4fa997fc1e6c68007d37
j-231119e0c85b45368f81875cd73c6bc6
j-231119cec93544f08d81c6ee73580916
j-23112146b0c0410c897c563140644f5a
The text was updated successfully, but these errors were encountered: