Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to download and register model using s3 presigned v4 url #669

Closed
vishal-wiai opened this issue Sep 8, 2020 · 11 comments
Closed

Failed to download and register model using s3 presigned v4 url #669

vishal-wiai opened this issue Sep 8, 2020 · 11 comments
Assignees
Labels
triaged_wait Waiting for the Reporter's resp

Comments

@vishal-wiai
Copy link

I'm using management API to register a model with a HTTPs URL. It is a s3 presigned url. Torchserve successfully downloads and registers models from s3 v2 signature URLs but fails to download for v4 signature types. Is there any bug to it?

I checked this behaviour with 3 of the models and also verified URL permissions. Using requests.get() I'm able to download but not on TorchServe.

@harshbafna harshbafna self-assigned this Sep 8, 2020
@harshbafna
Copy link
Contributor

@vishal-wiai: I was able to successfully register a model using an s3 pre-signed v4 URL.

Note: I had to replace & characters with its corresponding URL encoding character in the curl command i.e., %26.

@harshbafna harshbafna added the triaged_wait Waiting for the Reporter's resp label Sep 8, 2020
@vishal-wiai
Copy link
Author

Yes thank you @harshbafna . It worked after encoding & in the url with %26.

As a feature request, is it possible for TorchServe to do this encoding, under the hood?

@vishal-wiai
Copy link
Author

It works when you only provide the url parameter but fails when you provide additional params like initial_workers, batch_size, etc..
Any other encoding required?

@maaquib
Copy link
Collaborator

maaquib commented Sep 8, 2020

@vishal-wiai Can you provide the HTTPs URL alongwith the parameters you are using?

@harshbafna
Copy link
Contributor

@vishal-wiai:

As a feature request, is it possible for TorchServe to do this encoding, under the hood?

This is the way an HTTP request works, where the parameters supplied in the URL using ? are separated using & character. Thus, if the data to be passed to the URL, in your case the model URL, contains any & character, it needs to be encoded using the corresponding hexadecimal value. Hence, this needs to be handled at the client end.

It works when you only provide the url parameter but fails when you provide additional params like initial_workers, batch_size, etc..
Any other encoding required?

You only need to encode the & character in the URL data only. The other parameters should be separated using & character only.

Example

curl -X POST 'http://localhost:8081/models?url=<http_encoded_presigned_s3_url>&initial_workers=1'

@vishal-wiai
Copy link
Author

@harshbafna CURL did work for me but using requests gives me HTTP 400 error (for both encoded and decoded url). Am I missing something?

import requests

params = (('url', '<http-presign-s3-url>'), ('initial_workers', '1'))
response = requests.post('http://localhost:8081/models', params=params)

@harshbafna
Copy link
Contributor

@vishal-wiai: This seems to be a problem with the requests module in python which by default encodes every special character supplied in the params.

A simple approach would be URL formation through string concatenation:

url = 'http://localhost:8081/models?url='+model_url.replace('&', '%26')+'&initial_workers=1'
response = requests.post(url)

Reference: psf/requests#1454 (comment)

@dotel-saramsz
Copy link

dotel-saramsz commented Dec 8, 2020

I tried all of the techniques suggested above but still could not get torchserve to download the model from s3 presigned url.
The presigned url I got is of the following format:

https://<bucket-name>.s3.amazonaws.com/<object-name>?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=<some-credential>%2F<date_string>%2F<aws-region>%2F...

When I send this url as part of the POST request to register the model:

curl -X POST 'http://localhost:8081?url=<url_given_above>&model_name=mymodel'

I get back a 400 Bad Request response indicating that the request failed due to bad syntax. I tried replacing the & in the presigned url with %26 but to no avail.
The model can be downloaded from the presigned url through my browser and curl in my local machine but not through torchserve. Looks like if I have any query param (here, the AWS specific credentials) in the url param of the models/ POST request, the request fails.

Any pointers would be appreciated.

@harshbafna
Copy link
Contributor

@dotel-saramsz : Are you able to download the mar file with a simple curl/wget command using this pre-signed-URL?

@dotel-saramsz
Copy link

dotel-saramsz commented Dec 8, 2020

@harshbafna Yes, I am able to download with a simple curl command using the pre-signed url

@Iron-Stark
Copy link

Iron-Stark commented Oct 26, 2021

@dotel-saramsz @harshbafna Was this issue resolved? I am facing a similar problem right now. This is what my code snippet looks like:


    def create_presigned_url(self, bucket_name, object_name, expiration=3600):
        """Generate a presigned URL to share an S3 object

        :param bucket_name: string
        :param object_name: string
        :param expiration: Time in seconds for the presigned URL to remain valid
        :return: Presigned URL as string. If error, returns None.
        """

        # Generate a presigned URL for the S3 object
        s3_client = boto3.client('s3')
        try:
            response = s3_client.generate_presigned_url('get_object',
                                                        Params={'Bucket': bucket_name,
                                                                'Key': object_name},
                                                        ExpiresIn=expiration)
        except ClientError as e:
            logging.error(e)
            return None

        # The response contains the presigned URL
        return response
presigned_uri = self.create_presigned_url(bucket_name, object_name)
response = requests.post("http://127.0.0.1:8081/models?url={}".format(presigned_uri))

A simple get on the presigned uri is working but the model management API throws an error.

I tried doing all the recommended things like replacing & with %26, etc. but nothing seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged_wait Waiting for the Reporter's resp
Projects
None yet
Development

No branches or pull requests

5 participants