Change fork to thread #20996

Queimo · 2021-12-09T18:15:51Z

I use the latest version of wandb 0.12.7 and "fork" does not exist, only "thread" and "spawn", so I think "thread" is the right choice, worked for me, at least.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

When a list with mixed types is passed to tune.choice, they will be coerced to a single dtype during sampling (due to numpy.choice converting to an array internally). This behaviour is unintentional and surprising. This PR fixes this issue.

This removes unused code.

Expands the `to_torch` method for Datasets with: * An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists) * An ability to choose whether the label should be unsqueezed or not * An ability to pass `None` as the label (for prediction). Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists). Unit tests for all cases are included. Co-authored-by: matthewdeng <[email protected]>

…#21260) If we use `os.environ` to set environment variables in tests, then our tests become coupled. By using `monkeypatch`, we can safely set environment variables while ensuring our tests remain decoupled. For more information, see the [monkeypatching documentation](https://docs.pytest.org/en/6.2.x/monkeypatch.html#monkeypatching-environment-variables).

…t#21262) Inheriting from `abc.ABC` is more readable than setting the meta class to `abc.ABCMeta`. Relevant snippet from the Python 3.4 release notes: > New class ABC has ABCMeta as its meta class. Using ABC as a base class has essentially the same effect as specifying metaclass=abc.ABCMeta, but is simpler to type and easier to read. (Contributed by Bruno Dupuis in bpo-16049.) Co-authored-by: Richard Liaw <[email protected]> Co-authored-by: Matthew Deng <[email protected]>

ray-project#21352) This reverts commit fcb952e.

…ct#21347) * fix * fix test failure * Update release/nightly_tests/dataset/ray_sgd_training.py Co-authored-by: matthewdeng <[email protected]>

Co-authored-by: Amog Kamsetty <[email protected]>

…0965) Only `resource_avaialbe` and `resource_total` are used in raylet, so let's clear the rest before broadcasting.

…oject#21361)

…1295) This PR refactors several components to support switching to GCS address bootstrapping later: - Treat address from `ray.init()` and `ray` CLI as bootstrap address instead of assuming it is Redis address. - Ray client servers support `--address` flag instead of `--redis-address`. - A few other miscellaneous cleanup. Also, add a test for starting non-head node with `ray start`.

…t#21332)

Recent changes in the anyscale API rendered the current e2e script incompatible. This PR resolves these subtle API changes.

…roject#21377) This will start repro docker containers with SYS_PTRACE capabilities to enable debugging e.g. via py-spy. Additionally, default instance name tags for instance re-use will be generated using the buildkite build id and job id.

…ls"" (ray-project#21339)

We use `trial.checkpoint` to restore a perturbed trial. Currently trial.checkpoint is looking at both in-memory and persistent checkpoints to find the most recent one. The definition of "the most recent one" is based on iteration. This may no longer be a valid assumption in PBT case, considering `trial_low_quantile` may have an iter=2_persistent_checkpoint as well as a iter=1_in_memory_checkpoint (perturbed from `trial_upper_quantile`).

Fixes outdated `start_training` definitions and calls in Train logging callbacks & abstract classes.

…idation` (ray-project#21374) These two tests pass without issues on my Windows machine. Rest time out or fail.

New log4j version fixes vulnerability: * https://nvd.nist.gov/vuln/detail/CVE-2021-44832

…y-project#21232) After this change in GCS bootstrapping mode, Redis no longer starts and `address` is treated as the GCS address of the Ray cluster. Co-authored-by: Yi Cheng <[email protected]> Co-authored-by: Yi Cheng <[email protected]>

…r messages. (ray-project#21219) This PR is added to handle this comment; ray-project#20903 (comment) The PR - Unifies the multiple actor died error to a single schema. (cannot unify runtime env or creation task exception) - Improve each of actor error message to include more metadata. - Include actor information to actor death cause.

…tions for LSTM/attention for complex action spaces. (ray-project#21330)

`PublisherClient` is a more reasonable name than `SubscriberClient` since XClient means ‘client used to access X’, like GcsClient. Besides, in the current codebase we already called this client `publisher_client`(line 329/333), while the actual class name is `SubscriberClient`, this is inconsistent. https://github.com/ray-project/ray/blob/a8d7897a56d8dc2d2af550adee8cd5399c39dfb9/src/ray/pubsub/subscriber.cc#L326-L339

This PR moves the sdk to its own folder, then includes everything in `import ray.autoscaler.sdk` in ray's import path. Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way. Co-authored-by: Alex Wu <[email protected]>

There are some auto-generated streaming files, which are not removed. This PR removes them totally. Co-authored-by: 林濯 <[email protected]>

krfricke · 2022-01-26T08:39:36Z

Generally yes, but the linter did not pass (you'd have to run ./scripts/format.sh).

I had to make some changes to wandb today and incorporated your changes in the PR at #21892 - I'll add you as an coauthor to the commit and will close this once #21892 is merged, ok? No further action required on your part :-)
Thanks for contributing!

# Conflicts: # python/ray/tune/integration/wandb.py

Queimo · 2022-01-26T08:55:45Z

Ah I see, yes that's great! Sorry for the extra work, this was actually my first PR.

krfricke · 2022-01-26T09:00:53Z

I've also messed up the git merge here just now, so let's use #21892 :-)

rkooo567 assigned richardliaw Dec 14, 2021

krfricke and others added 29 commits January 3, 2022 14:36

[ci] Pin manylinux docker image (ray-project#21341)

10290ee

[tune] remove TrialExecutor.resume_trial. (ray-project#21225)

c18caa4

This removes unused code.

[CI] Add support for Black formatting (ray-project#21281)

43a9e95

Revert "WINDOWS: unskip passing runtime_env tests (ray-project#21252)" (

4581baa

ray-project#21352) This reverts commit fcb952e.

[BigDataTraining] Fix test script introduced by API change (ray-proje…

704404d

…ct#21347) * fix * fix test failure * Update release/nightly_tests/dataset/ray_sgd_training.py Co-authored-by: matthewdeng <[email protected]>

[Train] Add PrintCallback (ray-project#21261)

7efe1be

Co-authored-by: Amog Kamsetty <[email protected]>

[Core]Clear the unnecessary fields before broadcasting (ray-project#2…

b910648

…0965) Only `resource_avaialbe` and `resource_total` are used in raylet, so let's clear the rest before broadcasting.

[Serve] Fix iterator-and-mutate bug in FastAPI view (ray-project#21362)

e60a5f5

[Serve] [nit] Remove unreachable line in ActorReplicaWrapper(ray-pr…

6e77b39

…oject#21361)

[RLlib] QMIX better defaults + added to CI learning tests (ray-projec…

abd3bef

…t#21332)

[ci/release] Update to recent anyscale API changes (ray-project#21149)

aa35045

Recent changes in the anyscale API rendered the current e2e script incompatible. This PR resolves these subtle API changes.

[doc] Fix typos in serve documentation (ray-project#21379)

5aa00ba

[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, ur…

c012457

…ls"" (ray-project#21339)

[Serve] Clean up router.py (ray-project#21344)

e453837

[train] Fix start_training in logging callbacks (ray-project#21357)

3632494

Fixes outdated `start_training` definitions and calls in Train logging callbacks & abstract classes.

Unskipped test_reconfigure_with_exception & `test_deploy_handle_val…

819e034

…idation` (ray-project#21374) These two tests pass without issues on my Windows machine. Rest time out or fail.

[Java] Try to fix flaky NamespaceTest (ray-project#21370)

240e6ef

[Java] Bump log4j 2.17.0 to 2.17.1 (ray-project#21373)

8884cf0

New log4j version fixes vulnerability: * https://nvd.nist.gov/vuln/detail/CVE-2021-44832

[RLlib] Better utils for flattening complex inputs and enable prev-ac…

9e6b871

…tions for LSTM/attention for complex action spaces. (ray-project#21330)

Alex Wu and others added 3 commits January 25, 2022 14:43

Remove generated streaming pb and pom files. (ray-project#21851)

0c33ff7

There are some auto-generated streaming files, which are not removed. This PR removes them totally. Co-authored-by: 林濯 <[email protected]>

[tune/wandb] Use resume=False per default

81fa1cb

krfricke mentioned this pull request Jan 26, 2022

[tune/wandb] Use resume=False per default #21892

Merged

6 tasks

Merge remote-tracking branch 'Queimo/patch-1' into tune/wandb-resume

7d1b7fb

# Conflicts: # python/ray/tune/integration/wandb.py

Queimo requested review from alanwguo, AmeerHajAli, architkulkarni, avnishn, edoakes, ericl, gjoliver, ijrsvt, fishbone, joeybai, jovany-wang, kfstorm, pcmoritz, raulchen, richardliaw, rkooo567, robertnishihara, scv119, SongGuyang, sven1977 and wuisawesome as code owners January 26, 2022 08:53

krfricke closed this Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change fork to thread #20996

Change fork to thread #20996

Queimo commented Dec 9, 2021 •

edited

Loading

krfricke commented Jan 26, 2022

Queimo commented Jan 26, 2022

krfricke commented Jan 26, 2022

Change fork to thread #20996

Change fork to thread #20996

Conversation

Queimo commented Dec 9, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

krfricke commented Jan 26, 2022

Queimo commented Jan 26, 2022

krfricke commented Jan 26, 2022

Queimo commented Dec 9, 2021 •

edited

Loading