Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.57 #34

Closed
wants to merge 6,120 commits into from
Closed

1.57 #34

wants to merge 6,120 commits into from

Conversation

rschu1ze
Copy link
Member

No description provided.

drfloob and others added 30 commits March 8, 2023 13:11
…et the channel supply it. (#31867)

When the handshaker_service_url is in "host:port" format such as it
normally is when using ALTS in GCE (in which case it comes from then
this makes no difference as the authority and the URL are the same. But
when different URLs are used, the correct authority to use is not always
the same as the URL. For example if the URL is unix:///some/path then
the correct authority is "localhost". This is correctly computed by
grpc_core::UnixResolverFactory and stored as the channel's default
authority, but we throw that away when we override the authority for
individual RPCs.

Note indeed that the majority of other callers of grpc_channel_create_*
pass nullptr for the host/authority argument.
To support TPC feature for BYOID (3PI), we need to remove the validation
the pattern of impersonation endpoints, sts endpoints and token info
endpoints since they are different in TPC regions.

A security review is already passed at b/261634871

<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
This fixes a bug where connections cannot be made in IPv4-only
environments. To test, hard-code `IsIpv6LoopbackAvailable` to return
false.

Example Error:
`
D0309 00:29:49.514359445 235 tcp_client.cc:67] (event_engine)
EventEngine::Connect Status: INTERNAL: socket: Address family not
supported by protocol
`

This can also be reproduced in gRPC's benchmark environment, which does
not have IPv6 enabled.
…2569)

With iomgr, this test is effectively rate limited by ExecCtx and the
single thread running pollset_work, which results in thousands of tiny
writes happening before every read. A small set of _synchronous_ 8k
reads then dominate the read-side of the test. This is an efficient
balance.

With the Windows EventEngine, the fully asynchronous, multi-threaded
reads and writes end up alternating roughly 1:1, meaning that a read
callback is executed for every tiny handful of bytes, tens of thousands
of times. Compared to the Posix EventEngine, without things like TCP_INQ
and/or recvmsg's timeout, I don't know of any great signal for how much
data can safely be received in a batch (e.g., we don't want to wait for
data that will never come, and we don't want to run callbacks for 2
bytes over and over again if we have KB in the pipe).

I believe the Windows EventEngine is WAI. I can significantly improve
this test performance by artificially slowing the reader down (adding a
>= 1ms sleep), but I believe that improves this use case to the
detriment of all others.
Noticed here:
https://source.cloud.google.com/results/invocations/779f3614-42bd-44bb-a00d-ab56f9749095/targets/%2F%2Ftest%2Fcore%2Fend2end:h2_full_test@retry_cancel4@poller%3Depoll1@experiment%3Devent_engine_client/log

A race between starting reading and transport close.

Fix: move the closure setting inside the combiner, and check if the
transport is closed at that time.

<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
…lts_concurrent_connectivity_test (#32585)

This check only works if all handshake RPCs have an OK status, and it's
racey e.g. if the client is cancelling handshake RPCs (being when an RPC
is cancelled, termination of the RPC at the client is asynchronous from
termination at the server, so the client can resume the queue before the
server RPC completes).
First step in the modernization of our RBE stack (see
go/rbe-tech-debt-notes).

- Get rid of the deprecated rbe_autoconfig and start using
[rbe_configs_gen](https://github.com/bazelbuild/bazel-toolchains#rbe_configs_gen---cli-tool-to-generate-configs)
+ check in the generated toolchain configs.
- Switch from marketplace.gcr.io/google/rbe-ubuntu16-04 to
marketplace.gcr.io/google/rbe-ubuntu18-04 (this image is still not owned
by us, but at least it's newer and demonstrates how a switch to a newer
docker image is done).
- provide script for generating the linux RBE toolchain configs.
- cleanup RBE configuration in the bazelrc files used for remote build
- sort source files to ensure stable ordering
- generate one source file per line

together this should produce diffs that are much more readable by humans
when sources get added/removed to/from protobuf (and
make_grpcio_tools.py is used to regenerate).
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
A step toward #14016.

---------

Co-authored-by: ctiller <[email protected]>
PSM Interop: Local dev various improvements

- Cleanup resources on ctrl+c
- Add startup probes to address the issue with port forwarding starting
before the workload listens on a port
- Remove misleading restartPolicy: it's silently ignored by k8s
- Extra debug message with port-forwarding command
…script as-is (#32586)

Each `run.sh` should just pass those parameters through to the interop
client/server binaries as-is.

Corresponding framework PR:
GoogleCloudPlatform/grpc-gcp-tools#28
It is reported in grpc/grpc#32356 that there
is a race on vptr for `UnimplementedAsyncRequest` which would cause
crashes for multi-threaded server if clients send unimplemented RPC
request to the server.

The cause is that the server requests a call for
`UnimplementedAsyncRequest` in its base class `GenericAsyncRequest` when
the `vptr` still points to the base class's `vtable`. If the call went
in and another server thread picks up the tag before the `vptr` points
back to the derived class's `vtable`, it would call the wrong virtual
function and also this is a data race. This fix makes the request of the
call inside the derived class's constructor.

<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
… for protobuf upgrade. (#32590)

`bazel query deps(//src/proto/...)` seems unnecessary (regenerated
projects are identical) and causes trouble with protobuf 22.x (since it
basically breaks `tools/buildgen/generate_projects.sh` run and that
makes upgrade experiments painful).
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
Note that there is no behavior change associated with this PR. In other
words, folks that use `GRPC_ARG_ENABLE_PER_MESSAGE_DECOMPRESSION` and
`GRPC_ARG_ENABLE_PER_MESSAGE_COMPRESSION` will still see the same
behavior as before.

The actual change - The compression filter will always be added to the
filter stack for HTTP transports even if it is a no-op due to the above
channel args.


<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
…s WSARecv operations (#32563)

Built on grpc/grpc#32560

When calling EventEngine::Read, if a synchronous WSARecv call completes
successfully and 1) the read buffer is not full, and 2) the stream
remains open, then the endpoint will now chain execution of more
synchronous WSARecvs. The chain is broken and the on_read callback is
called when either there are errors, the next call would block, the
buffer is full, or the stream is closed.

Something like this is helpful to prevent excessive read callback
execution under a flood of tiny payloads, presuming messages are not
being combined as one would usually expect (see
`//test/core/iomgr:endpoint_pair_test`, and Nagle's algorithm).
- Increase kubernetes library default for urlib3 retries to 10
- Add custom retry logic to all API calls made by framework.k8s

Custom retry logic handles various errors we're experienced over
two years, and based on ~140 failure reports:

1. Errors returned by the k8s API server itself:
  - 401 Unauthorized
  - 409 Conflict
  - 429 Too Many Requests
  - 500 Internal Server Error
2. Connection errors that might indicate k8s API server is temporarily
   unavailable (such as a restart, upgrade, etc):
  - All `NewConnectionError`s, f.e. "Connection timed out",
    "Connection refused"
  - All "connection aborted" `ProtocolError`s, f.e. "Remote end
    closed connection  without response", "Connection reset by peer"

ref b/178378578, b/258546394
Now `messages.proto` requires `xds/v3/orca_load_report.proto`.
The dependency introduced in grpc/grpc#32524.

ref b/273575071
yashykt and others added 28 commits May 9, 2023 17:42
Change was created by the release automation script. See go/grpc-release
Change was created by the release automation script. See go/grpc-release
Backport #33239 to 1.55.

Co-authored-by: Craig Tiller <[email protected]>
… (#33304)

We configured TestGrid to file bug separately for each failed
sub-target, if we still fail the main target, TestGrid will fail
duplicate bugs.




<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
…3320)

This is a hack to get around an issue on Apple devices caused by the
PosixEventEngine's `poll` poller not supporting fork. This PR disables
the EventEngine poller entirely in Python builds. It will therefore
prevent the release of the EventEngine generally, and prevent any
testing of EventEngine integration with gRPC Python, until the `poll`
poller is fixed.




<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->

Co-authored-by: AJ Heller <[email protected]>
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
Turns out this is needed in addition to
grpc/grpc#33320 to resolve the MacOS fork issues
for Python.

CC @sampajano

Co-authored-by: AJ Heller <[email protected]>
…(v1.55.x backport) (#33524)

Backport of #33520 to v1.55.x.
---
Follow up change of #33222.

We don't want file multiple bugs if any of the sub-tests of the
`url_map` test fails.

<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
…v1.55.x backport) (#33724)

Backport of #33699 to v1.55.x.
---
Similar to grpc/grpc-java#8982.
All other languages already to this.

Needed for easier deployment of grpc/grpc#33685.

ref b/274944592
…rt) (#33808)

Backport of #33796 to v1.55.x.
---
grpc/grpc#33699 incorrectly changed the legacy
builds to not just use the test driver from the master, but also to
build from it. This PR fixes the issue, and also updates the python job
to work use the driver from master.
*Beep boop. This is an automatically generated backport of #33738.*

This should resolve breakage on master caused by the jump to Cython
3.0.0 this morning.
Update Phusion base image to address security issues. Original PR:
#33847
Change was created by the release automation script. See go/grpc-release
@rschu1ze rschu1ze closed this Nov 15, 2023
@CLAassistant
Copy link

CLAassistant commented Nov 15, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 13 committers have signed the CLA.

✅ rschu1ze
❌ veblush
❌ ctiller
❌ apolcyn
❌ markdroth
❌ Vignesh2208
❌ yashykt
❌ gnossen
❌ XuanWang-Amos
❌ larry-safran
❌ sergiitk
❌ eugeneo
❌ drfloob
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.