Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On AWS Organizations with many accounts populate fails. #112

Open
arnvid opened this issue Sep 20, 2023 · 11 comments
Open

On AWS Organizations with many accounts populate fails. #112

arnvid opened this issue Sep 20, 2023 · 11 comments

Comments

@arnvid
Copy link

arnvid commented Sep 20, 2023

We are seeing the error TooManyRequestsException when calling the ListAccountRoles operation for our AWS Organization.

cmd line used:
aws-sso-util configure populate -r eu-west-1 --force-refresh -u https://d-xxxxxxxxxx.awsapps.com/start

Logging in to https://d-xxxxxxxxx.awsapps.com/start
Login with IAM Identity Center required.
Attempting to open the authorization page in your default browser.
If the browser does not open or you wish to use a different device to
authorize this request, open the following URL:

https://device.sso.eu-west-1.amazonaws.com/

Then enter the code:

XXXX-XXXX

Gathering accounts and roles
Traceback (most recent call last):
File "/Users/arnvid/.local/bin/aws-sso-util", line 8, in
sys.exit(cli())
^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/aws_sso_util/populate_profiles.py", line 342, in populate_profiles
response = client.list_account_roles(**list_role_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/arnvid/.local/pipx/venvs/aws-sso-util/lib/python3.11/site-packages/botocore/client.py", line 980, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.TooManyRequestsException: An error occurred (TooManyRequestsException) when calling the ListAccountRoles operation (reached max retries: 4): HTTP 429 Unknown Code

@iainelder
Copy link

@arnvid How often does it happen to you?

The biggest identity center instance I work with just now gives me about 200 roles.

Here aws-sso-util sometimes gives me the same error. It almost always works when I retry the command.

@iainelder
Copy link

As far as I can tell aws-sso-util doesn't do anything that should obviously exceed a rate limit.

It instantiates the SSO client ẁith implicit default retry handling.

config = botocore.config.Config(
region_name=instance.region,
signature_version=botocore.UNSIGNED,
)
client = session.create_client("sso", config=config)

It starts a loop to call ListAccountRoles.

while True:
response = client.list_account_roles(**list_role_args)

It continues the loop until there are no more result pages.

next_token = response.get("nextToken")
if not next_token:
break
else:
list_role_args["nextToken"] = response["nextToken"]

The Identity Center documentation says its APIs have a collective throttle maximum of 20 transactions per second. I'm unsure what that means in practice. Does that limit apply to all users of the ListAccountRoles API? It seems like a low limit.

@iainelder
Copy link

You may be able to avoid the throttling errors by setting environment variables to control the SDK retry behavior.

I'd try something like this:

export AWS_RETRY_MODE=standard AWS_MAX_ATTEMPTS=100

The standard retry mode classes HTTP status code 429 as a transient error and so would automatically retry.

@arnvid
Copy link
Author

arnvid commented Sep 25, 2023

@arnvid How often does it happen to you?

The biggest identity center instance I work with just now gives me about 200 roles.

Here aws-sso-util sometimes gives me the same error. It almost always works when I retry the command.

It happends everytime on our production SSO. About 396 profiles before adding PIM'd roles.

@arnvid
Copy link
Author

arnvid commented Sep 25, 2023

With these added I can get through:
➜ ~ export AWS_RETRY_MODE=standard
➜ ~ export AWS_MAX_ATTEMPTS=100

Gathering accounts and roles
Writing 399 profiles to /Users/arnvid/.aws/config

@iainelder
Copy link

Thanks for confirming that those environment variables allow you to write the profiles.

And thanks for sharing info about the number of roles you have. My guess is that it's more likely to happen with a longer list of roles.

I think the next step would be to set up a lab environment with a variable number of roles between 100 and 1000 and see whether it's more likely at the bigger end of the scale.

If someone can reproduce the throttling error in a lab environment then maybe they could adjust the paging behavior to work without needing the user to set any environment variables.

@iainelder
Copy link

iainelder commented Oct 20, 2023

Someone reported the same problem in #97 (comment).

Earlier I proposed this solution:

If someone can reproduce the throttling error in a lab environment then maybe they could adjust the paging behavior to work without needing the user to set any environment variables.

Nice idea, but it sounds like a lot of work to compensate for bad API behavior on the AWS side.

I propose we configure the client that calls ListAccountProfiles with the same retry behavior effected by the environment variables so that no one has to think about this.

@benkehoe
Copy link
Owner

I'm nearing the end of my time off, and I plan on fully re-engaging with all of my projects, but realistically it means nothing is going to be addressed until early next year.

@benkehoe
Copy link
Owner

Finally back to this. I think the right way to solve this is to spread out the calls to match the right API rate limit. Do we know what that is?

@iainelder
Copy link

The Identity Center Quotas page says only this about rates:

IAM Identity Center APIs have a collective throttle maximum of 20 transactions per second (TPS). The CreateAccountAssignment has a maximum rate of 10 outstanding async calls. These quotas cannot be changed.

It's not clear to me whether 20 TPS applies only to the instance APIs or also to the Portal APIs.

Even if it does apply to the Portal, a single client's limit can be a lot less than 20 TPS.

It's not clear to me what "collective throttle maximum" means. Is it one quota for all clients of one Identity Center instance?

Any way to get a clarification from the Identity Center service team on this?

@iainelder
Copy link

iainelder commented May 30, 2024

a single client's limit can be a lot less than 20 TPS.

I haven't measured that. It was just a guess and it may be wrong.

For what it's worth, Granted uses a rate limit of 20 TPS to call ListAccountRoles.

// Setting the rate limit to 20 since IAM Identity Center APIs have a throttle maximum of 20 transactions per second (TPS) (https://docs.aws.amazon.com/singlesignon/latest/userguide/limits.html)
rl := uberratelimit.New(20)

I can't read Go well enough to understand how it handles throttling errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants