Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passwords containing special characters are not decoded correctly #145

Open
fgimian opened this issue Oct 2, 2019 · 17 comments
Open

Passwords containing special characters are not decoded correctly #145

fgimian opened this issue Oct 2, 2019 · 17 comments

Comments

@fgimian
Copy link

fgimian commented Oct 2, 2019

Hey there guys, our database password contains a # character and databases doesn't seem to like it 😄

In [25]: url = DatabaseURL("postgresql://username:password%23123@localhost/name")

In [26]: url
Out[26]: DatabaseURL('postgresql://username:********@localhost/name')

In [27]: url.password
Out[27]: 'password%23123'

I believe after urlsplit is called, you may also need to unquote each URL component?

If you do not encode the password, urlsplit fails to split things properly:

In [33]: url = DatabaseURL("postgresql://username:password#123@localhost/name")

In [34]: url.username

In [35]: url.password

In [36]: url.hostname
Out[36]: 'username'

In [37]: url.port
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-9d4372beb709> in <module>
----> 1 url.port

~/.virtualenv/vsparc-api-fastapi/lib64/python3.6/site-packages/databases/core.py in port(self)
    355     @property
    356     def port(self) -> typing.Optional[int]:
--> 357         return self.components.port
    358
    359     @property

/usr/lib64/python3.6/urllib/parse.py in port(self)
    167         port = self._hostinfo[1]
    168         if port is not None:
--> 169             port = int(port, 10)
    170             if not ( 0 <= port <= 65535):
    171                 raise ValueError("Port out of range 0-65535")

ValueError: invalid literal for int() with base 10: 'password'

This particular Python issue may be of relevance too.

Any help is greatly appreciated
Fotis

@fgimian
Copy link
Author

fgimian commented Oct 2, 2019

I've done lots more digging on this. I'm using databases with a PostgreSQL database (i.e. with the asyncpg backend. The reason this issue gets trickier is that databases passes the dsn directly to the asyncpg library which runs urlparse on it (see this for the related code).

Now the issue is, urlparse essentially results in the same output as urlsplit and also does not unquote the netloc property of the result which includes the username and password. As such, it also always fail to connect when a dsn is supplied with urlencoded username and password.

There seems to be an issue relating to it in the asyncpg repository which I hope to read into further now.

When using asyncpg directly, I would typically not use a dsn and pass individual elements which worked perfectly.

Cheers
Fotis

@euri10
Copy link
Member

euri10 commented Oct 3, 2019

I've done lots more digging on this. I'm using databases with a PostgreSQL database (i.e. with the asyncpg backend. The reason this issue gets trickier is that databases passes the dsn directly to the asyncpg library which runs urlparse on it (see this for the related code).

I found that you can hack that behaviour the following way, it won't solve your pound symbol in password issue as it's a bug upstream, still you may avoid passing a full DSN

import asyncio

from databases import Database, DatabaseURL


async def main():
    dburl = DatabaseURL('postgresql://')
    database = Database(dburl, host='localhost', port=5438, user="postgres", password="postgres")

    await database.connect()

    query = """SELECT * FROM pg_catalog.pg_tables"""
    r = await database.execute(query=query)
    print(r)


if __name__ == '__main__':
   loop = asyncio.get_event_loop()
   loop.run_until_complete(main())

@fgimian
Copy link
Author

fgimian commented Oct 3, 2019

@euri10 Great trick, thanks heaps! Though I do believe that databases itself also has a little problem in that it doens't unquote parts of the URL (particularly the netloc) after using urlsplit in the DatabaseURL class. There's a pull request in the asyncpg repo to repair the problem in the library, and it's essentially using unquote to ensure everything is as it should be.

@gvbgduh
Copy link
Member

gvbgduh commented Oct 3, 2019

@euri10 @fgimian you can also try url-encoding as asyncpg will decode it back before actual use.

@fgimian
Copy link
Author

fgimian commented Oct 3, 2019

@gvbgduh Thanks so much, yeah I think that the fix on asyncpg to do this was only just merged into master. So this should be safe for the next release or if you're installing from master. The latest release version (without the fix) doesn't decode URLs and therefore won't work. See MagicStack/asyncpg#436 for more info 😄

@fgimian
Copy link
Author

fgimian commented Oct 3, 2019

The only little thing that's a bit inelegant is when interrogating credentials via the databases library, youl'l still get URL encoded ouptut:

e.g.

In [1]: from databases import Database

In [2]: database = Database("postgresql://username:password%23123@localhost/name")

In [3]: database.url.password
Out[3]: 'password%23123'

I'm not quite sure how much work the databases library should do here. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.

What do you think?

Really appreciate your help!
Fotis

@tomchristie
Copy link
Member

I'm not quite sure how much work the databases library should do here. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.
But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.

That seems correct to me, yeah.
We should apply unquoting to all the string-type accessor methods, including url.password .

@taybin
Copy link
Contributor

taybin commented Oct 8, 2019

I've run into this as well, with other characters such as @ in the password. I thought this was a Starlette issue and opened up this issue: encode/starlette#662, but it looks like it's a little deeper.

@gvbgduh
Copy link
Member

gvbgduh commented Oct 9, 2019

@, : and / are a bit tricky as it might break things down the road with the asyncpg parser, though, it's getting fixed (MagicStack/asyncpg#419).
So, unquoting + #129 should do the trick.

@cypai
Copy link

cypai commented Oct 24, 2019

Looks like asyncpg has been updated:
MagicStack/asyncpg@89815ea

But even after upgrading asyncpg, this issue still shows up here.

@taybin
Copy link
Contributor

taybin commented Oct 28, 2019

One thing that might be useful is support for urlencoding in Starlette.Configuration, somewhere. No idea what it'd look like yet. I know that doesn't solve the issue for databases, but that seems like the right level for this sort of fix.

@taybin
Copy link
Contributor

taybin commented Oct 28, 2019

I put a proof of concept in encode/starlette#695.

@yuan1238y
Copy link

The reason for this issue is that password is not urlencoded string.
I'm fix this issue using this way:
database/core.py:DatabaseURL

from urllib.parse import unquote
...
    @property
    def username(self) -> typing.Optional[str]:
        return unquote(self.components.username)

    @property
    def password(self) -> typing.Optional[str]:
        return unquote(self.components.password)
...

For using:

from urllib.parse import quote
user = quote(user)  # e.g. user#001
password = quote(password)  # e.g. P@ssw0rd#
url = f"{dialect}://{user}:{password}@{host}:{port}/{db}..."

@afkjm
Copy link

afkjm commented Jul 8, 2020

It's possible to "monkey patch" DatabaseURL to allow percent encoding and decoding of specific properties.

import typing
from urllib.parse import unquote
from databases import Database, DatabaseURL

# monkey patch DatabaseURL to allow percent encoded passwords
# https://stackoverflow.com/a/36158137
def _password(self) -> typing.Optional[str]:
    return unquote(self.components.password)

setattr(DatabaseURL, 'password', property(_password))

db = Database(...)

@abe-winter
Copy link

the cloud sql DSNs mentioned in MagicStack/asyncpg#419 really break the DatabaseURL parser -- because Database refuses to pass a DSN down to the backend without parsing it, these hostnames will always end up null.

Example:

# hostname is null:
>>> DatabaseURL('postgresql:///?host=/unix/socket').hostname
# replace() doesn't help:
>>> DatabaseURL('postgresql:///?host=/unix/socket').replace(hostname='/unix/socket').hostname
>>> 

This is extra-bad because:

  • it triggers a ConnectionError in asyncpg which is hard to debug because there's no host
  • the Database constructor is in module scope (per your docs), so this is stopping servers from coming up at all -- really hard to debug in cloud, blocks most error capture systems from working

Could you pass the DSN down to the backend unmodified? I'm not familiar with this project, so there may be good reasons not to do this, but it would make life easier for parsing edge cases like this one.

@pmsoltani
Copy link

Great discussion!

I was curious that I didn't come across this issue before, since my passwords usually do contain special characters. Today, however, is my first brush with Databases. I use SQLAlchemy most of the time and noticed it uses regex to discern URL components:

sqlalchemy/lib/sqlalchemy/engine/url.py

def _parse_rfc1738_args(name):
    pattern = re.compile(
        r"""
            (?P<name>[\w\+]+)://
            (?:
                (?P<username>[^:/]*)
                (?::(?P<password>.*))?
            @)?
            (?:
                (?:
                    \[(?P<ipv6host>[^/]+)\] |
                    (?P<ipv4host>[^/:]+)
                )?
                (?::(?P<port>[^/]*))?
            )?
            (?:/(?P<database>.*))?
            """,
        re.X,
    )
    ...

I tried a simple database URL in regex101 using SQLAlchemy's pattern and it works, although clearly, this is not a comprehensive test:

image

@mryoho
Copy link

mryoho commented Nov 4, 2022

I was running into the issue with special characters in the URL and came across this issue. This was helpful.

Also it looks like unquote() is now used in the DatabaseURL class, so I believe the intended way to do pass passwords to the Database object would be to do something along the lines of what @yuan1238y suggested.

To quote them, the following now works without monkeypatching:

from urllib.parse import quote
user = quote(user)  # e.g. user#001
password = quote(password)  # e.g. P@ssw0rd#
url = f"{dialect}://{user}:{password}@{host}:{port}/{db}..."

It would be nice to have this added to the documentation I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests