Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reduce tfkeras iris const batch size #8633

Merged
merged 1 commit into from
Jan 3, 2024
Merged

Conversation

garrett361
Copy link
Member

@garrett361 garrett361 commented Jan 3, 2024

Description

The current const.yaml configuration fails due to a too-large global batch size. This PR reduces the global batch size.

Test Plan

Run det e create const.yaml . from within the examples/computer_vision/iris_tf_keras dir and verify that the trial runs without failing.

Commentary (optional)

Original error message pre-fix:

[2024-01-03 18:48:56]
[ac93e99e] Traceback (most recent call last): <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main <none> [2024-01-03 18:48:56]
[ac93e99e]     return _run_code(code, main_globals, None, <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code <none> [2024-01-03 18:48:56]
[ac93e99e]     exec(code, run_globals) <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/exec/harness.py", line 210, in <module> <none> [2024-01-03 18:48:56]
[ac93e99e]     sys.exit(main(args.train_entrypoint)) <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/exec/harness.py", line 125, in main <none> [2024-01-03 18:48:56]
[ac93e99e]     controller = controller_class.from_trial( <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/keras/_tf_keras_trial.py", line 295, in from_trial <none> [2024-01-03 18:48:56]
[ac93e99e]     validation_data = keras._adapt_data_from_data_loader( <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/keras/_data.py", line 190, in _adapt_data_from_data_loader <none> [2024-01-03 18:48:56]
[ac93e99e]     return _ArrayLikeAdapter(x, y, batch_size, sample_weight) <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/keras/_data.py", line 103, in __init__ <none> [2024-01-03 18:48:56]
[ac93e99e]     check.check_gt_eq(self._x_length, batch_size, "Batch size is too large for the input data.") <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/common/check.py", line 151, in check_gt_eq <none> [2024-01-03 18:48:56]
[ac93e99e]     return gt_eq(x, y, reason) <none> [2024-01-03 18:48:56]
[ac93e99e]   File "/run/determined/pythonuserbase/lib/python3.9/site-packages/determined/common/check.py", line 147, in gt_eq <none> [2024-01-03 18:48:56]
[ac93e99e]     raise CheckFailedError(msg) <none> [2024-01-03 18:48:56]
[ac93e99e] determined.common.check.CheckFailedError: CHECK FAILED! Got 30, expected value >= 32: Batch size is too large for the input data. 

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

@cla-bot cla-bot bot added the cla-signed label Jan 3, 2024
Copy link

netlify bot commented Jan 3, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit aad2c96
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/6595af1246e2920009d47857

@garrett361 garrett361 merged commit 134c2e1 into main Jan 3, 2024
69 of 83 checks passed
@garrett361 garrett361 deleted the fix-tfkeras-iris-const branch January 3, 2024 19:12
@dannysauer dannysauer modified the milestone: 0.27.1 Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants