Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup AWS Signature V4 code a bit and solve the memory leak #75

Merged
merged 16 commits into from
Feb 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Fixed
- Remove deprecation warnings due to usage of `utcnow` and `utcfromtimestamp`. Thanks to [`Raphael Krupinski`](https://github.com/rafalkrupinski).
- `httpx_auth.AWS4Auth.default_include_headers` value kept growing in size every time a new `httpx_auth.AWS4Auth` instance was created with `security_token` parameter provided. Thanks to [`Miikka Koskinen`](https://github.com/miikka).

### Changed
- `httpx_auth.AWS4Auth.default_include_headers` is not available anymore, use `httpx_auth.AWS4Auth` `include_headers` parameter instead to change the list of included headers if the default does not fit your need ().
- `httpx_auth.AWS4Auth` `include_headers` values will not be stripped anymore, meaning that you can now include headers prefixed and/or suffixed with blank spaces.

## [0.19.0] - 2024-01-09
### Added
Expand Down Expand Up @@ -114,7 +119,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- `get_token` cache method now requires `on_missing_token` function args to be provided as kwargs instead of args.
- `get_token` cache method now requires `on_missing_token` parameter to be provided as a non positional argument.
- `get_token` cache method now requires `on_missing_token` parameter to be provided as a non-positional argument.
- `get_token` cache method now expose `early_expiry` parameter, defaulting to 30 seconds.

### Fixed
Expand Down Expand Up @@ -166,13 +171,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Still under development, subject to breaking changes without notice: `AWS4Auth` authentication class for AWS. Ported from [`requests-aws4auth`](https://github.com/sam-washington/requests-aws4auth) by [`Michael E. Martinka`](https://github.com/martinka).
Note that a few changes were made:
- deprecated `amz_date` attribute has been removed.
- it is not possible to provide an `AWSSigningKey` instance, use explicit parameters instead.
- it is not possible to provide a `date`. It will default to now.
- it is not possible to provide `raise_invalid_date` parameter anymore as the date will always be valid.
- Deprecated `amz_date` attribute has been removed.
- It is not possible to provide an `AWSSigningKey` instance, use explicit parameters instead.
- It is not possible to provide a `date`. It will default to now.
- It is not possible to provide `raise_invalid_date` parameter anymore as the date will always be valid.
- `include_hdrs` parameter was renamed into `include_headers`
- `host` is not considered as a specific Amazon service anymore (no test specific code).
- Each request now has it's own signing key and x-amz-date. Meaning you can use the same auth instance for more than one request.
- Each request now has its own signing key and `x-amz-date`. Meaning you can use the same auth instance for more than one request.
- `session_token` was renamed into `security_token` for consistency with the underlying name at Amazon.

## [0.3.0] - 2020-05-26
Expand Down
29 changes: 21 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -667,7 +667,7 @@ OAuth2.token_cache = JsonTokenFileCache('path/to/my_token_cache.json')

## AWS Signature v4

Amazon Web Service Signature version 4 is implemented following [Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) and [request-aws4auth](https://github.com/sam-washington/requests-aws4auth).
Amazon Web Service Signature version 4 is implemented following [Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) and [request-aws4auth 1.0.1](https://github.com/sam-washington/requests-aws4auth) (with some changes, see below).

Use `httpx_auth.AWS4Auth` to configure this kind of authentication.

Expand All @@ -680,15 +680,28 @@ with httpx.Client() as client:
client.get('http://s3-eu-west-1.amazonaws.com', auth=aws)
```

Note that the following changes were made compared to `requests-aws4auth`:
- Each request now has its own signing key and `x-amz-date`. Meaning **you can use the same auth instance for more than one request**.
- `session_token` was renamed into `security_token` for consistency with the underlying name at Amazon.
- `include_hdrs` parameter was renamed into `include_headers`. When using this parameter:
- Provided values will not be stripped, [WYSIWYG](https://en.wikipedia.org/wiki/WYSIWYG).
- If multiple values are provided for a same header, the computation will be based on the value order you provided and value separated by `, `. Instead of ordered values separated by comma for `requests-aws4auth`.
- `amz_date` attribute has been removed.
- It is not possible to provide a `date`. It will default to now.
- It is not possible to provide an `AWSSigningKey` instance, use explicit parameters instead.
- It is not possible to provide `raise_invalid_date` parameter anymore as the date will always be valid.
- `host` is not considered as a specific Amazon service anymore (no test specific code).

### Parameters

| Name | Description | Mandatory | Default value |
|:-----------------|:---------------------------|:----------|:--------------|
| `access_id` | AWS access ID. | Mandatory | |
| `secret_key` | AWS secret access key. | Mandatory | |
| `region` | The region you are connecting to, as per [this list](http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region). For services which do not require a region (e.g. IAM), use us-east-1. | Mandatory | |
| `service` | The name of the service you are connecting to, as per [this list](http://docs.aws.amazon.com/general/latest/gr/rande.html). e.g. elasticbeanstalk. | Mandatory | |
| `security_token` | Used for the `x-amz-security-token` header, for use with STS temporary credentials. | Optional | |
| Name | Description | Mandatory | Default value |
|:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------|:---------------------------------------------------------------------------------------------------------------------------------|
| `access_id` | AWS access ID. | Mandatory | |
| `secret_key` | AWS secret access key. | Mandatory | |
| `region` | The region you are connecting to, as per [this list](http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region). For services which do not require a region (e.g. IAM), use us-east-1. | Mandatory | |
| `service` | The name of the service you are connecting to, as per [this list](http://docs.aws.amazon.com/general/latest/gr/rande.html). e.g. elasticbeanstalk. | Mandatory | |
| `security_token` | Used for the `x-amz-security-token` header, for use with STS temporary credentials. | Optional | |
| `include_headers` | Set of headers to include in the canonical and signed headers. Specific values are `x-amz-*` that matches any header starting with `x-amz-` (except for `x-amz-client-context`) and `*` that include every provided header. | Optional | {"host", "content-type", "date", "x-amz-*"} if `security_token` is provided, `x-amz-security-token` is also included by default. |

## API key in header

Expand Down
109 changes: 43 additions & 66 deletions httpx_auth/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@ class AWS4Auth(httpx.Auth):

requires_request_body = True

default_include_headers = ["host", "content-type", "date", "x-amz-*"]

def __init__(
self, access_id: str, secret_key: str, region: str, service: str, **kwargs
):
Expand All @@ -38,6 +36,12 @@ def __init__(
http://docs.aws.amazon.com/general/latest/gr/rande.html
e.g. elasticbeanstalk.
:param security_token: Used for the x-amz-security-token header, for use with STS temporary credentials.
:param include_headers: Set of headers to include in the canonical and signed headers.
{"host", "content-type", "date", "x-amz-*"} by default.
Note that if security_token is provided, x-amz-security-token is also included by default.
Specific values:
- "x-amz-*" matches any header starting with 'x-amz-' except for x-amz-client context.
- "*" will include every provided header.
"""
self.secret_key = secret_key
if not self.secret_key:
Expand All @@ -48,13 +52,14 @@ def __init__(
self.service = service

self.security_token = kwargs.get("security_token")
# TODO Check if we really need to be able to override this default ?

include_headers = {"host", "content-type", "date", "x-amz-*"}
if self.security_token:
# TODO Avoid modifying shared variable
self.default_include_headers.append("x-amz-security-token")
self.include_headers = kwargs.get(
"include_headers", self.default_include_headers
)
include_headers.add("x-amz-security-token")

self.include_headers = {
header.lower() for header in kwargs.get("include_headers", include_headers)
}

def auth_flow(
self, request: httpx.Request
Expand All @@ -77,9 +82,7 @@ def auth_flow(
if self.security_token:
request.headers["x-amz-security-token"] = self.security_token

cano_headers, signed_headers = self._get_canonical_headers(
request, self.include_headers
)
cano_headers, signed_headers = self._get_canonical_headers(request)
cano_req = self._get_canonical_request(request, cano_headers, signed_headers)
sig_string = self._get_sig_string(request, cano_req, scope)
sig_string = sig_string.encode("utf-8")
Expand Down Expand Up @@ -122,56 +125,31 @@ def _get_canonical_request(
]
return "\n".join(req_parts)

@classmethod
def _get_canonical_headers(
cls, req: httpx.Request, include: List[str]
) -> Tuple[str, str]:
def _get_canonical_headers(self, req: httpx.Request) -> Tuple[str, str]:
"""
Generate the Canonical Headers section of the Canonical Request.
Return the Canonical Headers and the Signed Headers strs as a tuple
(canonical_headers, signed_headers).

:param include: List of headers to include in the canonical and signed
headers. It's primarily included to allow testing against
specific examples from Amazon. If omitted or None it
includes host, content-type and any header starting 'x-amz-'
except for x-amz-client context, which appears to break
mobile analytics auth if included. Except for the
x-amz-client-context exclusion these defaults are per the
AWS documentation.
"""
include = [x.lower() for x in include]
headers = req.headers.copy()
# Aggregate for upper/lowercase header name collisions in header names,
# AMZ requires values of colliding headers be concatenated into a
# single header with lowercase name. Although this is not possible with
# Requests, since it uses a case-insensitive dict to hold headers, this
# is here just in case you duck type with a regular dict
cano_headers_dict = {}
for hdr, val in headers.items():
hdr = hdr.strip().lower()
val = cls._amz_norm_whitespace(val).strip()
if (
hdr in include
or "*" in include
or (
"x-amz-*" in include
and hdr.startswith("x-amz-")
and not hdr == "x-amz-client-context"
)
included_headers = {}
for header, header_value in req.headers.items():
if (header or "*") in self.include_headers or (
"x-amz-*" in self.include_headers
and header.startswith("x-amz-")
# x-amz-client-context break mobile analytics auth if included
and not header == "x-amz-client-context"
):
vals = cano_headers_dict.setdefault(hdr, [])
vals.append(val)
# Flatten cano_headers dict to string and generate signed_headers
cano_headers = ""
signed_headers_list = []
for hdr in sorted(cano_headers_dict):
vals = cano_headers_dict[hdr]
val = ",".join(sorted(vals))
cano_headers += f"{hdr}:{val}\n"
signed_headers_list.append(hdr)
signed_headers = ";".join(signed_headers_list)
return cano_headers, signed_headers
included_headers[header] = _amz_norm_whitespace(header_value)

canonical_headers = ""
signed_headers = []
for header in sorted(included_headers):
signed_headers.append(header)
canonical_headers += f"{header}:{included_headers[header]}\n"

signed_headers = ";".join(signed_headers)

return canonical_headers, signed_headers

@staticmethod
def _get_sig_string(req: httpx.Request, cano_req: str, scope: str) -> str:
Expand All @@ -184,10 +162,9 @@ def _get_sig_string(req: httpx.Request, cano_req: str, scope: str) -> str:
amz_date = req.headers["x-amz-date"]
hsh = hashlib.sha256(cano_req.encode())
sig_items = ["AWS4-HMAC-SHA256", amz_date, scope, hsh.hexdigest()]
sig_string = "\n".join(sig_items)
return sig_string
return "\n".join(sig_items)

def _amz_cano_path(self, path) -> str:
def _amz_cano_path(self, path: str) -> str:
"""
Generate the canonical path as per AWS4 auth requirements.
Not documented anywhere, determined from aws4_testsuite examples,
Expand Down Expand Up @@ -233,14 +210,6 @@ def _amz_cano_querystring(qs: str) -> str:
qs = "&".join(sorted(qs_strings))
return qs

@staticmethod
def _amz_norm_whitespace(text: str) -> str:
"""
Replace runs of whitespace with a single space.
Ignore text enclosed in quotes.
"""
return " ".join(shlex.split(text, posix=False))


def generate_key(secret_key: str, region: str, service: str, date: str) -> bytes:
init_key = f"AWS4{secret_key}".encode("utf-8")
Expand All @@ -252,3 +221,11 @@ def generate_key(secret_key: str, region: str, service: str, date: str) -> bytes

def sign_sha256(signing_key: bytes, message: str) -> bytes:
return hmac.new(signing_key, message.encode("utf-8"), hashlib.sha256).digest()


def _amz_norm_whitespace(text: str) -> str:
"""
Replace runs of whitespace with a single space.
Ignore text enclosed in quotes.
"""
return " ".join(shlex.split(text, posix=False)).strip()
Loading