Jax grad is slow compared to autograd elementwise_grad in my case #22050

krysros · 2024-06-23T09:19:16Z

krysros
Jun 23, 2024

In my case jax.grad is very slow compared to autograd.elementwise_grad, even in the simple example below. I've opened an issue #21436, but maybe Q&A is better place to get an answer why jax is so slow in my case and how to improve performance porting program from autograd to jax.

autograd (ex1.py)

import autograd.numpy as np
from autograd import elementwise_grad as egrad


dx, dy = 0, 1


def nabla4(w):
    def fn(x, y):
        return (
            egrad(egrad(egrad(egrad(w, dx), dx), dx), dx)(x, y)
            + 2 * egrad(egrad(egrad(egrad(w, dx), dx), dy), dy)(x, y)
            + egrad(egrad(egrad(egrad(w, dy), dy), dy), dy)(x, y)
        )

    return fn


def f(x, y):
    return x**4 + 2 * x**2 * y**2 + y**4


x = np.arange(10_000, dtype=np.float64)
y = np.arange(10_000, dtype=np.float64)


w = [f] * 100  # In a real program, the elements of the list are various functions.

r = [nabla4(f)(x, y) for f in w]

(idp) PS C:\Users\kryst\Projects\example> Measure-Command { python ex1.py }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 813
Ticks             : 8130392
TotalDays         : 9,41017592592593E-06
TotalHours        : 0,000225844222222222
TotalMinutes      : 0,0135506533333333
TotalSeconds      : 0,8130392
TotalMilliseconds : 813,0392

jax (ex2.py)

import jax
from jax import grad, vmap
import jax.numpy as jnp

jax.config.update("jax_enable_x64", True)


dx, dy = 0, 1


def nabla4(w):
    def fn(x, y):
        return (
            vmap(grad(grad(grad(grad(w, dx), dx), dx), dx))(x, y)
            + 2 * vmap(grad(grad(grad(grad(w, dx), dx), dy), dy))(x, y)
            + vmap(grad(grad(grad(grad(w, dy), dy), dy), dy))(x, y)
        )

    return fn


def f(x, y):
    return x**4 + 2 * x**2 * y**2 + y**4


x = jnp.arange(10_000, dtype=jnp.float64)
y = jnp.arange(10_000, dtype=jnp.float64)


w = [f] * 100  # In a real program, the elements of the list are various functions.

r = [nabla4(f)(x, y) for f in w]

(idp) PS C:\Users\kryst\Projects\example> Measure-Command { python ex2.py }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 6
Milliseconds      : 906
Ticks             : 69064939
TotalDays         : 7,99362719907407E-05
TotalHours        : 0,00191847052777778
TotalMinutes      : 0,115108231666667
TotalSeconds      : 6,9064939
TotalMilliseconds : 6906,4939

The program using jax is almost 9x slower than the version using autograd. In more complicated programs the differences are much greater. I would be grateful for suggestions on how to achieve similar (or better) performance using jax.

Answered by jakevdp

Jun 24, 2024

Yes, this seems consistent with the discussion at the link in my first response above. These microbenchmarks are in a regime where it’s not surprising that autograd is faster than JAX: i.e. individually dispatched small array operations on CPU.

View full answer

jakevdp · 2024-06-23T15:35:19Z

jakevdp
Jun 23, 2024
Maintainer

This is expected, I think.

If you replace "numpy with "autograd" in the answer here, the discussion iholds true, and is relevant to this question: https://jax.readthedocs.io/en/latest/faq.html#is-jax-faster-than-numpy

The problem is that you're performing microbenchmarks of repeated short function calls, which is not a domain that JAX was designed for, and is a domain that NumPy & autograd are designed for: they don't have any ability to jit compile programs, so it's important that this kind of dispatch overhead is minimized. For JAX, we generally get around that dispatch overhead by using transformations like jit and vmap.

In your benchmark, I think you'll find that if you jit-compile your repeated function call, JAX will be far faster:

jit_nabla4_f = jax.jit(nabla4(f))
r = [jit_nabla4_f(x, y) for f in w]

0 replies

krysros · 2024-06-23T19:47:50Z

krysros
Jun 23, 2024
Author

Thank you @jakevdp.

In your snippet f in second line is unused. You assume that f is the same function repeated in w list (because in this micro-benchmark it is), but in my program the elements of the list are different functions. I think this part should be written with jit like this:

# create list of jit-compiled functions
jit_nabla4_w = [jax.jit(nabla4(f)) for f in w] 

# calculate values using jit-compiled functions in (x, y) coordinates
r = [f(x, y) for f in jit_nabla4_w]

But this is even slower than previous version:

(idp) PS C:\Users\kryst\Projects\example> Measure-Command { python ex2.py }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 9
Milliseconds      : 558
Ticks             : 95581736
TotalDays         : 0,000110627009259259
TotalHours        : 0,00265504822222222
TotalMinutes      : 0,159302893333333
TotalSeconds      : 9,5581736
TotalMilliseconds : 9558,1736

This is probably also expected since these jit-compiled functions are only called once to get the results.

1 reply

jakevdp Jun 23, 2024
Maintainer

Right - the issue there is that each time you call nabla4(f) you get a new function object so JIT compiling it doesn't hit the cache from the previous iteration. You could address that like this:

# create list of jit-compiled functions
nabla4_f = nabla4(f)
jit_nabla4_w = [jax.jit(nabla4_f) for f in w] 

# calculate values using jit-compiled functions in (x, y) coordinates
r = [f(x, y) for f in jit_nabla4_w]

Then it should run as fast as my original version, because every function call will hit the JIT cache.

krysros · 2024-06-24T07:28:31Z

krysros
Jun 24, 2024
Author

Assuming the w list contains different functions, for example:

def f_1(x, y):
    return x**4 + 2 * x**2 * y**2 + y**4


def f_2(x, y):
    return jnp.sin(x)**2 + jnp.cos(y)**2


def f_3(x, y):
    return - jnp.sin(x) * jnp.sin(y)


def f_4(x, y):
    return jnp.sin(y) - 1/9 * x**3 + 1/2


w = [f_1, f_2, f_3, f_4] 
r = [nabla4(f)(x, y) for f in w]

the script using jax executes in 2851 milliseconds without jit.

Using jit as below:

w = [f_1, f_2, f_3, f_4] 
jit_nabla4_w = [jax.jit(nabla4(f)) for f in w] 
r = [f(x, y) for f in jit_nabla4_w]

the script executes in 1376 milliseconds.

The same as jax version without jit, but using autograd.numpy executes in 314 milliseconds.

In my real program, functions f that are elements of the w list are created at runtime using nested list comprehensions. This list also contains many more components. This probably also significantly slows down the execution of the program using jax.

These microbenchmarks appear to confirm the information contained in the jax documentation (FAQ). It looks like it will be difficult to take advantage of jax in my case. Given the execution time and CPU load, the current implementation is essentially impossible to practically use as a replacement for autograd.

2 replies

jakevdp Jun 24, 2024
Maintainer

Yes, this seems consistent with the discussion at the link in my first response above. These microbenchmarks are in a regime where it’s not surprising that autograd is faster than JAX: i.e. individually dispatched small array operations on CPU.

Answer selected by krysros

krysros Jun 24, 2024
Author

Thanks for the clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jax grad is slow compared to autograd elementwise_grad in my case #22050

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Jax grad is slow compared to autograd elementwise_grad in my case #22050

krysros Jun 23, 2024

autograd (ex1.py)

jax (ex2.py)

Replies: 3 comments · 3 replies

jakevdp Jun 23, 2024 Maintainer

krysros Jun 23, 2024 Author

jakevdp Jun 23, 2024 Maintainer

krysros Jun 24, 2024 Author

jakevdp Jun 24, 2024 Maintainer

krysros Jun 24, 2024 Author

krysros
Jun 23, 2024

Replies: 3 comments 3 replies

jakevdp
Jun 23, 2024
Maintainer

krysros
Jun 23, 2024
Author

jakevdp Jun 23, 2024
Maintainer

krysros
Jun 24, 2024
Author

jakevdp Jun 24, 2024
Maintainer

krysros Jun 24, 2024
Author