Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluesky authentication error: InvalidToken: Unexpected authorization type #6134

Open
GiovanH opened this issue Sep 3, 2024 · 9 comments · May be fixed by #6455
Open

Bluesky authentication error: InvalidToken: Unexpected authorization type #6134

GiovanH opened this issue Sep 3, 2024 · 9 comments · May be fixed by #6455

Comments

@GiovanH
Copy link
Contributor

GiovanH commented Sep 3, 2024

I'm experiencing authentication issues in bluesky very similar to #5780, except that clearing the cache doesn't resolve the problem.

$ py -3.11 -m gallery_dl 'https://bsky.app/profile/im.giovanh.com/post/3l2srgwskgt2n' -c gallery.conf --netrc --verbose
[gallery-dl][debug] Version 1.27.3 - Git HEAD: c5147527
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'gallery.conf'][gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[bluesky][info] Refreshing access token for im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
[urllib3.connectionpool][debug] https://bsky.social:443 "POST /xrpc/com.atproto.server.refreshSession HTTP/1.1" 401 59
[bluesky][debug] Server response: {'error': 'AuthMissing', 'message': 'Authentication Required'}[bluesky][error] AuthenticationError: "AuthMissing: Authentication Required"

$ py -3.11 -m gallery_dl 'https://bsky.app/profile/im.giovanh.com/post/3l2srgwskgt2n' -c gallery.conf --netrc --verbose --clear-cache bluesky
[gallery-dl][debug] Version 1.27.3 - Git HEAD: c5147527
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'gallery.conf'][cache][info] Deleted 2 entries from 'C:\Users\Seth\AppData\Roaming\gallery-dl\cache.sqlite3'


$ py -3.11 -m gallery_dl 'https://bsky.app/profile/im.giovanh.com/post/3l2srgwskgt2n' -c gallery.conf --netrc --verbose
[gallery-dl][debug] Version 1.27.4-dev - Git HEAD: 35957216
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'gallery.conf'][gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1" 200 42
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Awfmgbxfwfqbm7qbokusgg7gr%2Fapp.bsky.feed.post%2F3l37hsm3uud2j&depth=0&parentHeight=0 HTTP/1.1" 400 66
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')

Authentication is defined in netrc.

@GiovanH
Copy link
Contributor Author

GiovanH commented Nov 12, 2024

@mikf Still seeing this. Looks like:

[cache][info] Deleted 2 entries from 'C:\Users\<USER>\AppData\Roaming\gallery-dl\cache.sqlite3'

> py -3.11 -m gallery_dl https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p --verbose --print-traffic --netrc
[gallery-dl][debug] Version 1.28.0-dev - Git HEAD: 061b27f3
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p'
[bluesky][info] Logging in as im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Langua
ge: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 64\r\nAuthorization: Basic <TOKEN>\r\n\r\n'
send: b'{"identifier":"im.giovanh.com","password":"<APP PASSWORD>"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 30
header: RateLimit-Remaining: 28
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 30;w=300
header: ETag: <NOT SURE IF SENSITIVE>    
header: Vary: Accept-Encoding
header: Content-Encoding: gzip
[urllib3.connectionpool][debug] https://bsky.social:443 "POST /xrpc/com.atproto.server.createSession HTTP/1.1" 200 None
send: b'GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1\r\n
  Host: bsky.social\r\n
  User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\n
  Accept: application/json\r\n
  Accept-Language: en-US,en;q=0.5\r\n
  Accept-Encoding: gzip, deflate, br\r\n
  Referer: https://bsky.app/\r\n
  Authorization: Basic <TOKEN>\r\n\r\n'     
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 42
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 3000
header: RateLimit-Remaining: 2997
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 3000;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1" 200 42
send: b'GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HTTP/1.1\r\n
  Host: bsky.social\r\n
  User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\n
  Accept: application/json\r\n
  Accept-Language: en-US,en;q=0.5\r\n
  Accept-Encoding: gzip, deflate, br\r\n
  Referer: https://bsky.app/\r\n
  Authorization: Basic <TOKEN>\r\n\r\n'
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 66
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 3000
header: RateLimit-Remaining: 2996
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 3000;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HT
TP/1.1" 400 66
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}
[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')

It is clearly sending a basic auth token, and it's the same one that worked in the resolveHandle call. But it's also the token from createSession. Is it the wrong one?

I can also confirm this isn't a netrc issue, as passing an explicit --username and --password gives the same result.

@GiovanH
Copy link
Contributor Author

GiovanH commented Nov 12, 2024

A little debugging:

diff --git a/gallery_dl/extractor/bluesky.py b/gallery_dl/extractor/bluesky.py
index de5d0c6f..7147df0a 100644
--- a/gallery_dl/extractor/bluesky.py
+++ b/gallery_dl/extractor/bluesky.py
@@ -446,6 +446,7 @@ class BlueskyAPI():
 
     def authenticate(self):
         self.headers["Authorization"] = self._authenticate_impl(self.username)
+        self.log.info("Implicit authorization value is %s", self.headers["Authorization"])
 
     @cache(maxage=3600, keyarg=1)
     def _authenticate_impl(self, username):
@@ -483,9 +484,12 @@ class BlueskyAPI():
 
         while True:
             self.authenticate()
+            self.log.info("Calling request with authorization %s", self.headers["Authorization"])
             response = self.extractor.request(
                 url, params=params, headers=self.headers, fatal=None)
 
+            self.log.info("The actual headers sent were %s", response.request.headers)
+
             if response.status_code < 400:
                 return response.json()
             if response.status_code == 429:

With this patch applied, the problem is apparent:

[bluesky][info] Implicit authorization value is Bearer <long token>
[bluesky][info] Calling request with authorization Bearer <long token>
send: b'GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp
.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Moz
illa/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: application/js
on\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.ap
p/\r\nAuthorization: Basic <token>\r\n\r\n'
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Date: Tue, 12 Nov 2024 02:57:05 GMT
header: Content-Type: application/json; charset=utf-8
...
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3
A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0
 HTTP/1.1" 400 66
[bluesky][info] The actual headers sent were {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; 
rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': 'application/json', 'Accept-Language': 'en-US,en;q=0
.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/', 'Authorization': 'Basic <token>'}
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}   
[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')

See that the "authenticated" call is using the basic authorization token, not the explicitly passed bearer auth header! Something is wrong with Extractor's request implementation.

@GiovanH GiovanH linked a pull request Nov 12, 2024 that will close this issue
@GiovanH
Copy link
Contributor Author

GiovanH commented Nov 12, 2024

OK. I spent my whole day digging into this, but I've found a way to fix this. #6455

@mikf
Copy link
Owner

mikf commented Nov 12, 2024

Your version of gallery-dl is sending an Authorization: Basic <TOKEN> header during the login process.

[bluesky][info] Logging in as im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 64\r\nAuthorization: Basic <TOKEN>\r\n\r\n'
send: b'{"identifier":"im.giovanh.com","password":"<APP PASSWORD>"}'

This does not happen with vanilla settings or code. The only way this and all the other Authorization: Basic <TOKEN> headers would be possible is if the session's auth parameter would be set, which gallery-dl only does for danbooru and pixeldrain. I strongly suspect you've made some changes to your gallery-dl code that somewhere sets self.session.auth to a username-password tuple, which gets transformed by requests to a Basic authorization header for every request.

@GiovanH
Copy link
Contributor Author

GiovanH commented Nov 12, 2024

Your version of gallery-dl is sending an Authorization: Basic <TOKEN> header during the login process.

I agree that this is what's happening, but according to every metric I know, gallery-dl is doing this. I am invoking the local module with py -3.11 -m gallery_dl, and can see my exact worktree in #6455. The only thing I have in my global config.json file is

{
  "extractor": {
    "twitter": {
      "cookies": "<path>",
      "logout": true
    }
  }
}

gallery-dl starts passing a basic token to requests as early as

[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession

I can try to identify exactly why this is happening, but everything I see tells me this is vanilla behavior.

@mikf
Copy link
Owner

mikf commented Nov 12, 2024

There is no Authorization header during login with default settings.

$ gallery-dl --print-traffic -u foo -p bar -v --config-ignore https://bsky.app/profile/bsky.app
[gallery-dl][debug] Version 1.28.0-dev - Git HEAD: cd6d6ea8
[gallery-dl][debug] Python 3.12.7 - Linux-6.11.6-arch1-1-x86_64-with-glibc2.40
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/bsky.app'
[bluesky][debug] Using BlueskyUserExtractor for 'https://bsky.app/profile/bsky.app'
[bluesky][debug] Using BlueskyMediaExtractor for 'https://bsky.app/profile/bsky.app/media'
[bluesky][info] Logging in as foo
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 37\r\n\r\n'
send: b'{"identifier":"foo","password":"bar"}'

@GiovanH
Copy link
Contributor Author

GiovanH commented Nov 12, 2024

Requests is doing it because there's a netrc value set for bsky.social.

Tracing the initial login request:

https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/common.py#L172
kwargs['headers'] is {'Content-Type': 'application/json'}

https://github.com/psf/requests/blob/main/src/requests/sessions.py#L575
req.headers is {'Content-Type': 'application/json'}
auth is None

https://github.com/psf/requests/blob/main/src/requests/sessions.py#L478-L481
This is called; requests sets auth from netrc even though gallery-dl did not explicitly request it

https://github.com/psf/requests/blob/main/src/requests/sessions.py#L498
self.auth is None
auth is a username/password tuple
request.headers is {'Content-Type': 'application/json'}
self.headers is

{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/'}

p.headers is

{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/', 'Content-Type': 'application/json', 'Content-Length': '64', 'Authorization': 'Basic TOKEN'}

Temporarily removing my bsky.social netrc entry works around the problem, even on origin/master.
I hate this, but I still think gallery-dl should handle this case somehow.

@mikf
Copy link
Owner

mikf commented Nov 12, 2024

Wow, now that's something I would have never expected. Good find.

I'll probably add an option to disable requests' trust_env and/or use a "noop" session auth so requests doesn't try to interfere.

mikf added a commit that referenced this issue Nov 15, 2024
(#6134, #6455)
disable using environment proxies by default
@mikf
Copy link
Owner

mikf commented Nov 15, 2024

Should be fixed with 0a72a50 by preventing requests from gathering .netrc auth and overwriting Authorization headers with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants