Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GaussianProcessRegression() optimize does not work in a subprocess #645

Open
ioananikova opened this issue Oct 26, 2022 · 5 comments
Open
Labels
bug Something isn't working

Comments

@ioananikova
Copy link

Describe the bug
When a pool of processes is used for executing calls (like with concurrent.futures.ProcessPoolExecutor), the optimize() method of GaussianProcessRegression() will take forever and never finish. More specifically this happens in evaluate_loss_of_model_parameters().

To reproduce
Steps to reproduce the behaviour:

  1. Create a pool of processes
  2. Make sure a GPR model is created in a process
  3. Update the model
  4. Then try to optimize the model (it will fail here)

A minimal reproducible code example is included to illustrate the problem. (change to .py)
test_concurrent_trieste.txt

Expected behaviour
The expected behavior is that the optimize function behaves as it would in a normal process (not subprocess). Usually this step takes less than a second to finish.

System information

  • OS: Ubuntu-20.04 (in WSL), on Windows 10
  • Python version: 3.9.9
  • Trieste version: 0.13.0 (the pip version, release tag or commit hash)
  • TensorFlow version: 2.10.0
  • GPflow version: 2.6.3

Additional context
Even if the import statements are in the subprocess, it fails.

@ioananikova ioananikova added the bug Something isn't working label Oct 26, 2022
@uri-granta
Copy link
Collaborator

(Confirmed that this is still broken with latest version, possibly hitting some sort of deadlock.)

@uri-granta
Copy link
Collaborator

This is somehow connected to the use of tf.function compilation. Disabling tracing with tf.config.run_functions_eagerly(True) allows the code example to run (though at the obvious expense of executing everything eagerly each time). Will investigate further.

@uri-granta
Copy link
Collaborator

It's also somehow connected to something trieste or one its dependant libraries does:

# COMMENTING OUT EITHER import trieste OR @tf.function MAKES THIS PASS!
import concurrent.futures
import tensorflow as tf
import trieste

@tf.function
def say_hi():
    tf.print("hi")

def concurrency_test(n):
    print(f"I'm going to say hi!")
    say_hi()

if __name__ == "__main__":
    with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
        executor.map(concurrency_test, [10])

@uri-granta
Copy link
Collaborator

Ok, so it looks like this is due to some state initialisation performed by tensorflow when you call it for the first time. Replacing import trieste with tf.constant(42) or similar in the example above also hangs.

The solution is to avoid importing trieste until you're inside the subprocess:

import concurrent.futures

WORKERS = 1

def test_concurrent(num_initial_points):
    from trieste.objectives.single_objectives import Branin
    import trieste
    from trieste.models.gpflow import GaussianProcessRegression, build_gpr
    print(f'num_initial_points: {num_initial_points}')
    branin_obj = Branin.objective
    search_space = Branin.search_space
    observer = trieste.objectives.utils.mk_observer(branin_obj)

    initial_query_points = search_space.sample_halton(num_initial_points)
    initial_data = observer(initial_query_points)
    print('initial data created')

    gpflow_model = build_gpr(initial_data, search_space, likelihood_variance=1e-7)
    model = GaussianProcessRegression(gpflow_model)
    print('model created')

    model.update(initial_data)
    print('model updated')
    model.optimize(initial_data)
    print('model optimized')


if __name__ == "__main__":
    with concurrent.futures.ProcessPoolExecutor(max_workers=WORKERS) as executor:
        executor.map(test_concurrent, [10])

@uri-granta
Copy link
Collaborator

I'll see whether we can document this anywhere. Does this solve your issue? (if you can remember back to October 2022!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants