Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make arns of all task resources aws-partition aware #131

Merged
merged 7 commits into from
May 21, 2021

Conversation

yuan-bwn
Copy link
Contributor

@yuan-bwn yuan-bwn commented May 5, 2021

Issue #, if available: #120
Update:
Resolve previous comments.
Use boto3 patch mock for unit tests.
Use f-string for string building.
Create arn builder for providing integration resource arn.
Create tests for arn builder.

Test:
All unit tests and integ tests passed.

Description of changes:
Support different aws partitions when SDK providing resource arns. Partitions will be determined by boto3 client session running region.
Available regions are based on SFN regional availability.
There are formatting changes for test_steps.py which done by IDE. They are valid changes and will not have functionality impact.

Test:
Created new unit under test_steps suite. All units tests passed.
All integration tests passed in both us-east-1 and cn-north-1 using test account.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yuan-bwn yuan-bwn marked this pull request as draft May 5, 2021 21:20
@yuan-bwn yuan-bwn marked this pull request as ready for review May 6, 2021 18:20
Add comment to method explanation.

# Obtain matching aws partition name based on region
# Retrun "aws" as default if no region detected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups! Typo here: "Return"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I will make the change in next commit

Comment on lines 20 to 21
from stepfunctions.steps import Pass, Succeed, Fail, Wait, Choice, ChoiceRule, Parallel, Map, Task, Retry, Catch, Chain, \
utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reorganize the imports so that each is on another line to make is more readable? It would also make it easier to spot the change next time a new import is added from stepfunctions.step.

Something like:

from stepfunctions.steps import (
    Catch,
    Chain,
    Choice,
    ChoiceRule,
    Fail,
    Map,
    Parallel,
    Pass,
    Retry,
    Succeed,
    Task,
    Wait,
    utils,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the legacy across this repo and it has many places import like this. As these imports have relatively short name, to me importing same line is a better choice. Make the list longer is definitely clearer but the looking will be clumsy.
But let me know if this is hard to read. If so I can maybe divide them into 2 or 3 lines so that we have a clearer reading as well as keeping the context less bulky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add some linting (in a different PR) to enforce style preferences.

not strongly opinionated on whether we go with PEP8, PyLint, or other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. So far I think there are definitely different style existing in this repo.

Comment on lines 26 to 27
# Retrun "aws" as default if no region detected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in "Retrun". May be worth mentioning this is from the default boto3 session.

Suggested change
# Obtain matching aws partition name based on region
# Retrun "aws" as default if no region detected
# Obtain matching aws partition name based on region
"""
Returns the aws partition for the the current boto3 session.
Defaults to 'aws' if the region could not be detected.
"""

Python method comments are also usually inside the method blocks as a docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Will make the fix in new commit.


if cur_region is None:
logger.warning("No region detected for the session, will use default partition: aws")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's worth mentioning this is for a boto3 session

Suggested change
logger.warning("No region detected for the session, will use default partition: aws")
logger.warning("No region detected for the boto3 session. Using default partition: aws")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will do

src/stepfunctions/steps/utils.py Show resolved Hide resolved
@@ -88,10 +95,12 @@ def test_succeed_state_creation():
'Comment': 'This is a comment'
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add extra new lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra new line came from the IDE. Usually it's a 2 blank line between python methods and it's the case for other files. I kept this auto-reformat to make it meet the convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think* that's the PEP8 standard. we should enable linting on the repo at some point to enforce style preferences



# Test if boto3 session can fetch correct aws partition info from test environment
def test_util_get_aws_partition():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to a separate file for just the utils module with separate test case methods for aws and aws-cn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Will do

Comment on lines 452 to 453
aws_partition = "aws"
aws_cn_partition = "aws-cn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not really necessary to declare variables for these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it makes the reading better if we mention what we are testing but they both will work I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty obvious to me. But this is just nitpicking

assert utils.get_aws_partition() == 'aws'
assert utils.get_aws_partition() == 'aws-cn'

Comment on lines 456 to 459
# Boto3 used either info from ~/.aws/config or AWS_DEFAULT_REGION in environment
# to determine current region. We will replace/create AWS_DEFAULT_REGION with regions in
# different aws partition to test that when regions are changed, correct partition info
# can be retrieved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mutating environment variables, did you consider mocking boto3 to override region_name but call through to the remaining functions?

Copy link
Contributor Author

@yuan-bwn yuan-bwn May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I didn't. I came up this testing flow during my local setup and think this will satisfy the unit test need as well as straightforward for users.
I can try boto3 mock if this makes more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, you can do this with patch decorators: https://docs.python.org/3/library/unittest.mock-examples.html#patch-decorators

@patch.object(boto3.session.Session, 'region_name', 'cn-north-1')
def test_get_aws_partition_for_china_region():
  # ...

@@ -38,9 +39,9 @@ def __init__(self, state_id, wait_for_callback=False, **kwargs):
output_path (str, optional): Path applied to the state’s output after the application of `result_path`, producing the effective output which serves as the raw input for the next state. (default: '$')
"""
if wait_for_callback:
kwargs[Field.Resource.value] = 'arn:aws:states:::lambda:invoke.waitForTaskToken'
kwargs[Field.Resource.value] = 'arn:' + get_aws_partition() + ':states:::lambda:invoke.waitForTaskToken'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily blocking: I would like to at least consider class for ARN construction instead of building strings with repeated calls to get_aws_partition() inline everywhere. The AWS SDK in other languages have utilities for this, but python doesn't.

Something like:

kwards[Field.Resource.value] = ARN(service='states', resource='lambda:invoke')

Or even scoping it to service integration ARNs:

kwargs[Field.Resource.value] = ServiceIntegrationARN(service='lambda', api='invoke', integration_pattern=IntegrationPattern.WAIT_FOR_TASK_TOKEN)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitely make the code cleaner. I will look into it.
If I can not finish this quickly I will prepare a PR specific for it later.

@@ -67,9 +68,9 @@ def __init__(self, state_id, wait_for_completion=True, **kwargs):
output_path (str, optional): Path applied to the state’s output after the application of `result_path`, producing the effective output which serves as the raw input for the next state. (default: '$')
"""
if wait_for_completion:
kwargs[Field.Resource.value] = 'arn:aws:states:::glue:startJobRun.sync'
kwargs[Field.Resource.value] = 'arn:' + get_aws_partition() + ':states:::glue:startJobRun.sync'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if we extracted the logic for generating ARNs into a utility.

additionally, consider string interpolation using f-strings rather than string concatenation, which is supported in Python 3.6+ (our current pre-requisites)

you can read more about it here

@@ -43,11 +44,11 @@ def aws_account_id():

@pytest.fixture(scope="session")
def sfn_role_arn(aws_account_id):
return "arn:aws:iam::{}:role/StepFunctionsMLWorkflowExecutionFullAccess".format(aws_account_id)
return "arn:" + get_aws_partition() + ":iam::{}:role/StepFunctionsMLWorkflowExecutionFullAccess".format(aws_account_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-strings would also help here as they're evaluated at runtime and would get you out of using both format and concatenation on the same string.

Comment on lines 20 to 21
from stepfunctions.steps import Pass, Succeed, Fail, Wait, Choice, ChoiceRule, Parallel, Map, Task, Retry, Catch, Chain, \
utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add some linting (in a different PR) to enforce style preferences.

not strongly opinionated on whether we go with PEP8, PyLint, or other

@@ -88,10 +95,12 @@ def test_succeed_state_creation():
'Comment': 'This is a comment'
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think* that's the PEP8 standard. we should enable linting on the repo at some point to enforce style preferences

default_region = None

# Boto3 used either info from ~/.aws/config or AWS_DEFAULT_REGION in environment
# to determine current region. We will replace/create AWS_DEFAULT_REGION with regions in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple questions here:

  1. What will happen if your assertion fails after cn-north-1 is set as the default region? will those changes persist for the next boto session because the test will bail or is there a guarantee that environment variables that are being touched will always be restored to a pristine state?
  2. does this work with named profiles and other mechanisms that can be used to establish aws credentials where the DEFAULT_REGION is not set as an environment variable?

It's generally not preferable for unit tests to have any dependencies. Extraneous processes could be running or setting the region and this test could interfere and introduce surprising behaviour by modifying them.

i know the stubber allows you to create a session for a client with a specific region, but i'm not sure what mocking the session looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good question. From my understanding this will continue to fail as the cn-north-1 will be set as env var which will keep being effective till next run. To improve this specific issue I can save exceptions as non-fatal and assert at the very end of test so that changes got reversed before assertion.
  2. Yes. I tested with both environment variable and ~/.aws/config file. Can't think of other configuration methods but please let me know if you have some in your mind.
  3. I do agree the test should be on itself and should be mocked without dependency. So far I tried to mock boto3 client/session but have no luck. Will give stubber a try.

Copy link
Contributor

@wong-a wong-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments about structure, naming, and documentation. Almost there!

src/stepfunctions/steps/utils.py Outdated Show resolved Hide resolved
tests/unit/test_steps_utils.py Show resolved Hide resolved
src/stepfunctions/steps/compute.py Outdated Show resolved Hide resolved
ARN builder for task integration
"""
def resource_integration_arn_builder(service, api, integration_pattern=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This belongs in the service integration module
  2. Since this a method, let's name it with a verb instead. e.g. get/build_service_integration_arn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) still applies

Comment on lines 28 to 41
class IntegrationServices(Enum):
Lambda = "lambda"
SageMaker = "sagemaker"
Glue = "glue"
ECS = "ecs"
Batch = "batch"
DynamoDB = "dynamodb"
SNS = "sns"
SQS = "sqs"
ElasticMapReduce = "elasticmapreduce"


class LambdaApi(Enum):
Invoke = "invoke"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these. We don't need to refer to the service or api names outside the Step classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had similar thought before I start coding.
But during the implementation, I found it's important to have the well organized service/api for development. It will not only help with auto-completion which helps accelerate coding speed, but will also help reading/changing/examining the code as we have our resource gathered in a place rather than sparsely distributed among places.
Another reason is: we have the integration pattern passed in as enum, this is limited number hard-coded param exactly same as our integrated services and their apis.
I think this will also help with organizing further expansion when we are introducing more service and apis into the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced. Arguments for types and autocompletion make sense if they are to be used in many places. That is not the case for "lambda" and "invoke", etc.. There's no need to reference them outside the module defining the service integration steps, so don't expose them.

Another reason is: we have the integration pattern passed in as enum, this is limited number hard-coded param exactly same as our integrated services and their apis.

Not sure what you mean here

ARN builder for task integration
Args:
service(str): name of the task resource service
api(enum): api to be integrated of the task resource service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
api(enum): api to be integrated of the task resource service
api(str): api to be integrated of the task resource service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will replace "str" with "Api" as it could provide a clearer explanation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more confusing as it references a type that doesn't exist. Keep as str. The parameter name is already api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the new commit I replaced it with "Api" which I think it is clearer than only "Api". Sorry forgot to update it here.
Let me know if that's still confusing. I can change it to "str". The reason I think "str" is not the best choice is: I implemented the API as one enum and fetch the value in the get_service_integration_arn method. "str" is not really reflecting the input.

src/stepfunctions/steps/utils.py Outdated Show resolved Hide resolved
src/stepfunctions/steps/utils.py Outdated Show resolved Hide resolved
Comment on lines 21 to 24
Lambda = "lambda"
Glue = "glue"
Ecs = "ecs"
Batch = "batch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Lambda = "lambda"
Glue = "glue"
Ecs = "ecs"
Batch = "batch"
LAMBDA_SERVICE_NAME = "lambda"
GLUE_SERVICE_NAME = "glue"
ECS_SERVICE_NAME = "ecs"
BATCH_SERVICE_NAME = "batch"

nit: Variable names should written be in snake_case. Class names are PascalCase. Constants in CAPITAL_SNAKE_CASE. Repeats.

Batch = "batch"


class LambdaApi(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (not blocking): Not necessary to be a class

Comment on lines 70 to 72
kwargs[Field.Resource.value] = get_service_integration_arn(Lambda,
LambdaApi.Invoke,
IntegrationPattern.WaitForTaskToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation is off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

wong-a
wong-a previously approved these changes May 19, 2021
Copy link
Contributor

@wong-a wong-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Approval conditional on applying the docstring suggestion for get_service_integration_arn

src/stepfunctions/steps/integration_resources.py Outdated Show resolved Hide resolved
Copy link
Contributor

@shivlaks shivlaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 changes LGTM! provisional approval on adding doc-strings to classes without them.

Comment on lines +70 to +76

"""
Example resource arn: arn:aws:states:::dynamodb:getItem
"""

kwargs[Field.Resource.value] = get_service_integration_arn(DYNAMODB_SERVICE_NAME,
DynamoDBApi.GetItem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these through the PR, it definitely helps as a review and will benefit contributors!

Invoke = "invoke"


class GlueApi(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking: also agree that these don't need to be classes.

If you do keep them as classes, we should include docstrings that explain what they are and their purpose :)

Comment on lines +40 to +44
if integration_pattern == IntegrationPattern.RequestResponse:
arn = f"arn:{get_aws_partition()}:states:::{service}:{api.value}"
else:
arn = f"arn:{get_aws_partition()}:states:::{service}:{api.value}.{integration_pattern.value}"
return arn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice to see us leveraging f-strings!

ELASTICMAPREDUCE_SERVICE_NAME = "elasticmapreduce"


class DynamoDBApi(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all our classes should include docstrings

@StepFunctions-Bot
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-sEHrOdk7acJc
  • Commit ID: 33d624b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@yuan-bwn yuan-bwn merged commit 4f90ba3 into aws:main May 21, 2021
@yuan-bwn yuan-bwn deleted the aws_partition_main branch May 21, 2021 02:56
@wong-a wong-a mentioned this pull request May 27, 2021
wong-a added a commit that referenced this pull request May 27, 2021
Increments VERSION for release to PyPI which will include the following changes since v2.0.0:
* feature: Make Choice states chainable #132
* fix: Make service integration ARNs partition aware #131
@wong-a
Copy link
Contributor

wong-a commented May 28, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants