Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway fetching uplink indefinitely #1405

Closed
Nicoowr opened this issue Jan 19, 2022 · 11 comments
Closed

Gateway fetching uplink indefinitely #1405

Nicoowr opened this issue Jan 19, 2022 · 11 comments

Comments

@Nicoowr
Copy link

Nicoowr commented Jan 19, 2022

Latest versions it didn't occur

    "@apollo/federation": "0.29.0",
    "@apollo/gateway": "0.38.0",

Current versions

    "@apollo/gateway": "0.45.1",
    "@apollo/subgraph": "0.1.5",

Set-up

Everything is hosted on AWS Lambda (Gateway & Subservices)

Expected behavior

Setting schemaConfigDeliveryEndpoint: undefined in gateway config should keep the old behavior, namely not fetching the supergraph from uplink.

Actual behavior

Setting schemaConfigDeliveryEndpoint: undefined does not always prevent the gateway from fetching the supergraph from uplink. When it does, the post requests do not succeed and the gateway continues to fetch until it times out:

Screenshot 2022-01-19 at 16 59 51

This might be linked to #949

@glasser
Copy link
Member

glasser commented Jan 26, 2022

Just to confirm: you are trying to use the old Google Cloud Storage-based system for getting schemas in your server instead of the newer Uplink system? This behavior was removed in @apollo/[email protected], as mentioned in the changelog.

Can you help us understand why Uplink doesn't work for you? The Uplink system gives us the ability to manage permissions on your graphs in a more reliable and dynamic way, and has allowed us to provide multi-cloud support so that Uplink continues to work even when one of our cloud vendors has a global failure.

@glasser glasser closed this as completed Jan 26, 2022
@Nicoowr
Copy link
Author

Nicoowr commented Jan 27, 2022

Hi @glasser
Actually we'd like to use the old behavior of the gateway, namely when it does the composition by itself. But setting schemaConfigDeliveryEndpoint: undefined does not seem to work anymore.

We'd be glad to use Uplink, but using the new uplink endpoints (or even the former one) led to timeout problems like the one mentioned in this issue, or the one here: #949

I have no idea why it behaves like this, perhaps it's due to lambda runtime but it's hard to say...

@trevor-scheer
Copy link
Member

@Nicoowr that's very surprising to hear. In its former mode of behavior the gateway had to perform a series of fetches (literally one fetch waiting for the next) to the network along with the actual composition before it was ready to serve requests. With uplink it gets to skip all of that and perform just one fetch.

If you can provide us with some additional details i.e. where time is being lost when using Uplink that could be helpful. Do you know what your current lambda timeout is?

@Nicoowr
Copy link
Author

Nicoowr commented Jan 28, 2022

@trevor-scheer Our lambda timeout is set to 28s, for both gateway and subservices.
AFAIK, the gateway does not even call the subservice since it's trapped in the loop you can see on the screenshot. It's very weird because every call to the uplink endpoint has a 200 status, but the gateway keeps polling 🤔

@trevor-scheer
Copy link
Member

@Nicoowr thanks for the extra info. I don't think a 200 is conclusive, but this does seem to be an issue for you and others so I think we have some more digging to do. Is there a way for you to share what's in those responses from Uplink?

My first suspicion is that there might be some actual errors preventing the gateway from successfully starting.

@trevor-scheer
Copy link
Member

This should be resolved via #1503 / #1504 (releasing @apollo/[email protected] as we speak)

@trevor-scheer
Copy link
Member

I should backpedal a bit here - #1503 does resolve an issue that's demonstrated in your screenshot (gateway shouldn't send 7x requests per cycle when it's getting 200s). #949 seems like a completely different problem set that might still be blocking successfully using Uplink.

In any case, I hope you try out the new version and report back here with results (good or bad!).

@Nicoowr
Copy link
Author

Nicoowr commented Feb 11, 2022

@trevor-scheer Thank you very much for your reactivity! I'll try it asap and tell you how it goes :)

@Nicoowr
Copy link
Author

Nicoowr commented Feb 21, 2022

@trevor-scheer It seems like the new version causes not reproductible errors Cannot convert undefined or null to object.
It does not happen everytime which is really weird. I'll provide more information ASAP.

@glasser
Copy link
Member

glasser commented Feb 21, 2022

@Nicoowr Do those come with stack traces?

@Nicoowr
Copy link
Author

Nicoowr commented Feb 21, 2022

Well it's actually the client (a SNS subscriber in this case) making the request which sometimes throws this kind of error:
Screenshot 2022-02-21 at 22 02 00
I've checked the logs of the gateway and it's hard to find anything relevant. What I know though is that if I revert to a not-managed schema, everything works.

It's not much information, I'll try to provide more asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants