Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: OpenAI API keys passed as multibyte strings #44

Merged
merged 1 commit into from
Apr 12, 2024
Merged

fix: OpenAI API keys passed as multibyte strings #44

merged 1 commit into from
Apr 12, 2024

Conversation

hraban
Copy link
Contributor

@hraban hraban commented Apr 12, 2024

Emacs has two types of strings: multibyte and unibyte. The request library is essentially a giant ‘concat’ call, which converts the entire result to multibyte if any single component is multibyte, including the headers. Even if you encoded the body: that effect will be spoiled by a single multibyte header string. This is regardless of the header actually containing multibyte characters: while an Emacs string literal containing only simple characters will be unibyte, an API key fetched from an external source will often be multibyte, e.g. ‘shell-command-to-string’.

Example:

(dolist (x (list
            "x"
            (shell-command-to-string "printf x")
            (encode-coding-string (shell-command-to-string "printf x") 'utf-8)))
  (let ((s (concat x (encode-coding-string "é" 'utf-8))))
    (message
     "%S: %s(%s) %s, %s"
     s
     (multibyte-string-p s)
     (multibyte-string-p x)
     (string-bytes s)
     (length s))))

Output:

"x\303\251": nil(nil) 3, 3
"x\303\251": t(t) 5, 3
"x\303\251": nil(nil) 3, 3

And:

(multibyte-string-p "foo") ; NIL
(multibyte-string-p "fôo") ; T

Emacs has two types of strings: multibyte and unibyte. The request library is
essentially a giant ‘concat’ call, which converts the entire result to multibyte
if any single component is multibyte, including the headers. Even if you encoded
the body: that effect will be spoiled by a single multibyte header string. This
is regardless of the header actually containing multibyte characters: while an
Emacs string literal containing only simple characters will be unibyte, an API
key fetched from an external source will often be multibyte,
e.g. ‘shell-command-to-string’.

Example:

(dolist (x (list
            "x"
            (shell-command-to-string "printf x")
            (encode-coding-string (shell-command-to-string "printf x") 'utf-8)))
  (let ((s (concat x (encode-coding-string "é" 'utf-8))))
    (message
     "%S: %s(%s) %s, %s"
     s
     (multibyte-string-p s)
     (multibyte-string-p x)
     (string-bytes s)
     (length s))))

Output:

"x\303\251": nil(nil) 3, 3
"x\303\251": t(t) 5, 3
"x\303\251": nil(nil) 3, 3

And:

(multibyte-string-p "foo") ; NIL
(multibyte-string-p "fôo") ; T
@hraban
Copy link
Contributor Author

hraban commented Apr 12, 2024

For context: calls are broken when:

  • made to an openai LLM provider
  • with an API key fetch from an external source (or for some other reason are a multibyte string)
  • the request body contains non-ascii characters

@ahyatt ahyatt merged commit 4058691 into ahyatt:main Apr 12, 2024
@ahyatt
Copy link
Owner

ahyatt commented Apr 12, 2024

Thank you for the fix! I had to fix a similar bug a while ago - I should definitely add tests for multibyte strings so we don't run into this again.

@hraban hraban deleted the fix/multibyte-api-keys branch April 12, 2024 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants