Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLake async file client cannot download using SAS tokens restricted to specific folders #23804

Closed
simebg opened this issue Apr 4, 2022 · 7 comments · Fixed by #23854
Closed
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@simebg
Copy link

simebg commented Apr 4, 2022

  • Package Name: azure-storage-file-datalake
  • Package Version: 12.6.0
  • Operating System: Ubuntu 20.04
  • Python Version: 3.9.7

Describe the bug
SAS tokens for specific folders in a storage account don't work in the async client, however they do in the sync one.

SAS tokens generated on the whole container do work on both async and sync clients.

To Reproduce
Replace relevant variables in the following code, then execute it:

path = '...'
container = '...'
account_url = '...'

# sas token on specific folder (works on sync, doesn't work on async)
sas_token = '...'
# sas token on container (works on both sync and async)
# sas_token = '...'

def sync_run():
    print('Trying synchronous API...')

    from azure.storage.filedatalake import DataLakeFileClient

    with DataLakeFileClient(account_url, container, path, sas_token) as client:
        data = client.download_file().readall()
        print(f'Read {len(data)} bytes.')

def async_run():
    async def run():
        print('Trying asynchronous API...')

        from azure.storage.filedatalake.aio import DataLakeFileClient

        async with DataLakeFileClient(account_url, container, path, sas_token) as client:
            data = await (await client.download_file()).readall()
            print(f'Read {len(data)} bytes.')

    import asyncio
    asyncio.get_event_loop().run_until_complete(run())


if __name__ == '__main__':
    sync_run()
    async_run()

Expected behavior
Output should be something similar to:

$ python3 test.py 
Trying synchronous API...
Read 3060 bytes.
Trying asynchronous API...
Read 3060 bytes.

However, I get the following:

$ python3 test.py 
Trying synchronous API...
Read 3060 bytes.
Trying asynchronous API...
Traceback (most recent call last):
  File "/home/azure-async-test/test.py", line 64, in <module>
    async_run()
  File "/home/azure-async-test/test.py", line 59, in async_run
    asyncio.get_event_loop().run_until_complete(run())
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/azure-async-test/test.py", line 55, in run
    data = await (await client.download_file()).readall()
  File "/home/azure-async-test/.venv/lib/python3.9/site-packages/azure/storage/filedatalake/aio/_data_lake_file_client_async.py", line 481, in download_file
    downloader = await self._blob_client.download_blob(offset=offset, length=length, **kwargs)
  File "/home/azure-async-test/.venv/lib/python3.9/site-packages/azure/core/tracing/decorator_async.py", line 74, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/home/azure-async-test/.venv/lib/python3.9/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 480, in download_blob
    await downloader._setup()  # pylint: disable=protected-access
  File "/home/azure-async-test/.venv/lib/python3.9/site-packages/azure/storage/blob/aio/_download_async.py", line 250, in _setup
    self._response = await self._initial_request()
  File "/home/azure-async-test/.venv/lib/python3.9/site-packages/azure/storage/blob/aio/_download_async.py", line 336, in _initial_request
    process_storage_error(error)
  File "/home/azure-async-test/.venv/lib/python3.9/site-packages/azure/storage/blob/_shared/response_handlers.py", line 181, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
  File "<string>", line 1, in <module>
azure.core.exceptions.ClientAuthenticationError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:46dbc42d-701e-0044-2024-48986f000000
Time:2022-04-04T13:04:00.5021799Z
ErrorCode:AuthenticationFailed
authenticationerrordetail:Invalid resource path
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:46dbc42d-701e-0044-2024-48986f000000
Time:2022-04-04T13:04:00.5021799Z</Message><AuthenticationErrorDetail>Invalid resource path</AuthenticationErrorDetail></Error>

Additional context
The test script was run on a new venv, with azure-storage-file-datalake (12.6.0) and aiohttp (3.8.1) installed.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 4, 2022
@azure-sdk azure-sdk added Client This issue points to a problem in the data-plane of the library. needs-team-triage Workflow: This issue needs the team to triage. Storage Storage Service (Queues, Blobs, Files) labels Apr 4, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 4, 2022
@l0lawrence
Copy link
Member

Thanks for the feedback, we'll investigate asap.

@l0lawrence l0lawrence removed the needs-team-triage Workflow: This issue needs the team to triage. label Apr 4, 2022
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Apr 4, 2022
@vincenttran-msft
Copy link
Member

Hi @simebg Emilian, would you be able to provide a more concrete example for the relevant variables you've blocked out? Of course please omit any sensitive information, but I am mostly interested in the following:

  1. Your file and directory structure (which can be revealed with a more concrete path and container example (even if that means changing the names to things such as foo or blah)
  2. What method(s) you are using to generate your sas_token and the parameters you are providing to the method(s)

With that, I believe the team will be able to better assist you!

Thanks!

@simebg
Copy link
Author

simebg commented Apr 5, 2022

Hi @vincenttran-msft, here is the information you asked for. Please let me know if you require more information.

Here is an example of the variables and the file structure:

# variables
path = 'foo/bar'
container = 'test'
account_url = 'https://foo.blob.core.windows.net/'

# file structure
https://foo.blob.core.windows.net/
└── test (container)   <- generating SAS token on container works with async
    └── foo            <- generating SAS token on folder doesn't work with async
        └── bar

The ACL permissions in the container, the folder, and the file have all been left as default.

For generating the SAS token, I used the web interface in the Azure portal and generated one token for the container and one token for the folder.

@vincenttran-msft
Copy link
Member

Hi @simebg Emilian, thank for the extra context I really appreciate it!

In conclusion, we have discovered that there is a slight difference in the parsing of the parameters between sync and async clients, and we are currently working on a bugfix in order to alleviate this difference and make them both behave the same (and correctly).

In the meantime, from our testing and looking into your RequestID, I think the following should resolve your issues with failing async operations:

  1. Ensure your path does not having leading (or extra) / characters
  2. Ensure your container does not have any trailing (or extra) / characters
  3. Check anywhere else in your user-provided input that there is not any extraneous / characters

Hopefully this will resolve your issue. However, if it does not, please don't hesitate to reply back to this thread, preferably with a RequestID (a recent one as the logs autoclear on a rolling basis) and we will do our best to resolve the issue.

Thanks!

@navba-MSFT navba-MSFT self-assigned this Apr 6, 2022
@navba-MSFT navba-MSFT added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team CXP Attention labels Apr 6, 2022
@navba-MSFT
Copy link
Contributor

@simebg Could you please let us know if you had a chance to follow the above action plan ? Awaiting your reply.

@simebg
Copy link
Author

simebg commented Apr 6, 2022

Thank you for looking into this @vincenttran-msft. You are correct about the extra /, I did use a leading / in the path. After removing it, both sync and async worked for me with the folder-specific SAS token.

$ python3 test.py 
Trying synchronous API...
Read 3060 bytes.
Trying asynchronous API...
Read 3060 bytes.

Hello @navba-MSFT, I just tried @vincenttran-msft's suggestion, and removing any trailing and leading / from the variables fixed the issue with the async client.

Thank you both!

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Apr 6, 2022
@navba-MSFT navba-MSFT removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Apr 6, 2022
@navba-MSFT
Copy link
Contributor

@simebg Thanks for getting back. We will now proceed with closure of this github issue. If you need any further assistance on this issue in future, please feel free to reopen this thread. We would be happy to help.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
6 participants