Example `nlp/lstm_seq2seq.py` doesn't train with JAX backend #322

fchollet · 2023-06-11T23:36:01Z

The model trains fine with TF. With JAX, the code runs but trains rather poorly. Need to find out root cause.

In addition, in this example, a model trained with TF cannot be reloaded with JAX.

fchollet · 2023-06-12T04:45:42Z

I verified that the two backends give identical forward pass numerics for the model.

The saving issue was unrelated and I've now fixed it.

So there must be something different about either initializers or the optimization process somehow.

fchollet · 2023-06-12T21:05:56Z

I ruled out initialization. So it can only really be an optimizer issue or trainer issue.

fchollet · 2023-06-14T23:21:32Z

It trains fine in torch as well.

shivance · 2023-07-11T17:53:26Z

Hi @fchollet, can I contribute here ?

fchollet · 2023-07-12T17:00:17Z

Absolutely! It's a pretty hard issue I think. The problem is very non-obvious, everything looks good piecewise. TF and torch train well, but JAX trains very poorly (while still training anyway). The code is here:

https://gist.github.com/fchollet/f0c84ecbed8441e54820df8366a5a629

ariG23498 · 2023-07-13T09:41:45Z

While working on the code the "tensorflow" backend threw an error.

Here is the Gist to reproduce the error: https://gist.github.com/ariG23498/b8b4c0912a0a19dfe2ef8b29b3160943

fchollet · 2023-07-14T22:38:37Z

The error message tells you that cuDNN can't be compiled to XLA, basically. It's somewhat tricky to solve on our side. Either we disable jit_compile if there's a cuDNN-enabled layer, or we don't use cuDNN if we detect we're tracing for XLA?

fchollet · 2023-07-14T22:38:55Z

To work around this you can just pass jit_compile=False to compile() when using TF.

ariG23498 · 2023-07-15T07:20:24Z

To work around this you can just pass jit_compile=False to compile() when using TF.

Right! Now it trains using tf backend.

The same issue persists when loading the saved model it seems. Do I also need to pass compile=False when loading the saved model (when using TensorFlow as backend)?

fchollet · 2023-07-17T21:59:39Z

Do I also need to pass compile=False when loading the saved model (when using TensorFlow as backend)?

You can just set jit_compile = False on the model I think.

fchollet · 2023-07-17T22:08:13Z

@ariG23498 I have fixed it in this commit, please check that it works for you. c9bce12

qlzh727 · 2023-09-19T20:53:29Z

ok, I was able to verify my fix in #888 to fix this issue.

See https://colab.corp.google.com/drive/1z_QDD0uX9ApLJdFTxhHYxJNejUdgowkG#scrollTo=kxjMd749C1nA. Will send a PR very soon.

fchollet added the help wanted Extra attention is needed label Jun 12, 2023

fchollet mentioned this issue Jul 29, 2023

Add keras_core example for lstm_seq2seq #636

Closed

fchollet mentioned this issue Sep 19, 2023

Memory optimization for jax trainer. #888

Merged

qlzh727 mentioned this issue Sep 19, 2023

Fix JAX RNN backend issue. #924

Merged

fchollet closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example `nlp/lstm_seq2seq.py` doesn't train with JAX backend #322

Example `nlp/lstm_seq2seq.py` doesn't train with JAX backend #322

fchollet commented Jun 11, 2023 •

edited

Loading

fchollet commented Jun 12, 2023

fchollet commented Jun 12, 2023

fchollet commented Jun 14, 2023

shivance commented Jul 11, 2023

fchollet commented Jul 12, 2023

ariG23498 commented Jul 13, 2023

fchollet commented Jul 14, 2023

fchollet commented Jul 14, 2023 •

edited

Loading

ariG23498 commented Jul 15, 2023

fchollet commented Jul 17, 2023

fchollet commented Jul 17, 2023

qlzh727 commented Sep 19, 2023

Example nlp/lstm_seq2seq.py doesn't train with JAX backend #322

Example nlp/lstm_seq2seq.py doesn't train with JAX backend #322

Comments

fchollet commented Jun 11, 2023 • edited Loading

fchollet commented Jun 12, 2023

fchollet commented Jun 12, 2023

fchollet commented Jun 14, 2023

shivance commented Jul 11, 2023

fchollet commented Jul 12, 2023

ariG23498 commented Jul 13, 2023

fchollet commented Jul 14, 2023

fchollet commented Jul 14, 2023 • edited Loading

ariG23498 commented Jul 15, 2023

fchollet commented Jul 17, 2023

fchollet commented Jul 17, 2023

qlzh727 commented Sep 19, 2023

Example `nlp/lstm_seq2seq.py` doesn't train with JAX backend #322

Example `nlp/lstm_seq2seq.py` doesn't train with JAX backend #322

fchollet commented Jun 11, 2023 •

edited

Loading

fchollet commented Jul 14, 2023 •

edited

Loading