-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible Student T instability? #110
Comments
Did all of the failed runs include You can you try increasing the |
yes, I think all successful and failed runs had |
okay -- can you compile a list of parameters for which runs succeeded and failed? maybe the failures have something in common. |
not sure if this is causal, but the ev11 likelihood should be adjusted to used a shift in its bijectors for transformed variables:
|
note to self:
|
@DHekstra , is it true that the common factor in failed training runs was not Student T but rather image layers? |
yes, that is true. this batch of runs did not include a no-image layer "control". the no-image-layer case did complete without problems previously. |
Okay, @DHekstra , please give version 0.3.5 a try when you have a chance. |
See attached files. Performing two-step inference for data processed in CrystFEL by AP, Careless run by KIW. NLL term diverges. This seems to be the key part of the traceback:
`Traceback (most recent call last):
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/bin/careless", line 8, in
sys.exit(main())
^^^^^^
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/careless.py", line 9, in main
run_careless(parser)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/careless.py", line 53, in run_careless
history = model.train_model(
^^^^^^^^^^^^^^^^^^
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 173, in train_model
_history = train_step((self, data))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal' defined at (most recent call last):
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/bin/careless", line 8, in
sys.exit(main())
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/careless.py", line 9, in main
run_careless(parser)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/careless.py", line 53, in run_careless
history = model.train_model(
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 173, in train_model
_history = train_step((self, data))
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 159, in train_step
history = model.train_step((data,))
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/engine/training.py", line 1050, in train_step
y_pred = self(x, training=True)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/engine/training.py", line 558, in call
return super().call(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/engine/base_layer.py", line 1145, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 121, in call
z_f = self.surrogate_posterior.sample(self.mc_sample_size)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/surrogate_posteriors.py", line 50, in sample
s = self.distribution.sample(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow_probability/python/distributions/distribution.py", line 1205, in sample
return self._call_sample_n(sample_shape, seed, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow_probability/python/distributions/distribution.py", line 1182, in _call_sample_n
samples = self._sample_n(
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow_probability/python/distributions/truncated_normal.py", line 251, in _sample_n
return tf.random.stateless_parameterized_truncated_normal(
Node: 'variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal'
Detected at node 'variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal' defined at (most recent call last):
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/bin/careless", line 8, in
sys.exit(main())
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/careless.py", line 9, in main
run_careless(parser)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/careless.py", line 53, in run_careless
history = model.train_model(
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 173, in train_model
_history = train_step((self, data))
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 159, in train_step
history = model.train_step((data,))
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/engine/training.py", line 1050, in train_step
y_pred = self(x, training=True)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/engine/training.py", line 558, in call
return super().call(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/engine/base_layer.py", line 1145, in call
outputs = call_fn(inputs, *args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/variational.py", line 121, in call
z_f = self.surrogate_posterior.sample(self.mc_sample_size)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/careless/models/merging/surrogate_posteriors.py", line 50, in sample
s = self.distribution.sample(*args, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow_probability/python/distributions/distribution.py", line 1205, in sample
return self._call_sample_n(sample_shape, seed, **kwargs)
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow_probability/python/distributions/distribution.py", line 1182, in _call_sample_n
samples = self._sample_n(
File "/home/groups/brunger/kiwhite/software/anaconda3/envs/careless/lib/python3.11/site-packages/tensorflow_probability/python/distributions/truncated_normal.py", line 251, in _sample_n
return tf.random.stateless_parameterized_truncated_normal(
Node: 'variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal'
2 root error(s) found.
(0) INVALID_ARGUMENT: Invalid parameters
[[{{node variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal}}]]
[[variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal/_14]]
(1) INVALID_ARGUMENT: Invalid parameters
[[{{node variational_merging_model/TruncatedNormal_CONSTRUCTED_AT_top_level/sample/stateless_parameterized_truncated_normal/StatelessParameterizedTruncatedNormal}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_step_6249]`
careless_22576794.out.txt
careless_22576794.err.txt
inputs_params.log.txt
slurm_script.txt
The text was updated successfully, but these errors were encountered: