Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character encoding broken #953

Open
VladimirFokow opened this issue Aug 2, 2024 · 6 comments
Open

Character encoding broken #953

VladimirFokow opened this issue Aug 2, 2024 · 6 comments
Assignees

Comments

@VladimirFokow
Copy link

VladimirFokow commented Aug 2, 2024

When copying non-English characters (specifically, Ukrainian) from/to the clipboard, the encoding is broken.

To reproduce:

Install pyperclip:

pip install pyperclip

Write a simple python script, e.g.:

import pyperclip

text = pyperclip.paste()  # text is whatever is in the clipboard
pyperclip.copy(text)  # save text to clipboard

Copy some Ukrainian characters to your clipboard, e.g.: тест

When the script is run from the terminal, e.g. python script.py, it works fine: it saves to the clipboard the same thing that was there.

When it's run as the raycast "Script Command", it saves this to the clipboard: ????

If I change the script so that it doesn't get text from the clipboard but instead saves "тест" to clipboard directly: pyperclip.copy('тест'), then I get this in my clipboard: —Ç–µ—Å—Ç

@dehesa
Copy link
Member

dehesa commented Aug 5, 2024

Hey there @VladimirFokow,

Sorry to hear you are having issues. I am sadly not very knowledgeable with Python. We have had similar problems before, but to be honest I don't know what the issue could be. Here is what I said to a previous user.

@VladimirFokow
Copy link
Author

VladimirFokow commented Aug 5, 2024

Hi, thanks for the reply..
Just tested it with bash - it has the same problem.

Also printed the env variables LANG and PATH:

#!/bin/bash

# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title test_cp
# @raycast.mode fullOutput

# Optional parameters:
# @raycast.description Test the clipboard
# @raycast.packageName test_cp
# @raycast.icon 🧪


# Save the content of the clipboard into a variable `text`
text=$(pbpaste)
# Save the content of the variable `text` back into the clipboard
echo "$text" | pbcopy

# Example non-English characters to copy: тест
# result in the clipboard: ????




echo $LANG  # en_DE.UTF-8
echo $PATH  # /usr/local/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin






# # Alternative test:
# text='тест'
# echo "$text" | pbcopy
# # result in the clipboard: —Ç–µ—Å—Ç

we run the scripts as a subprocess

Could you please point to the code location where this subprocess is created? (to try isolating the issue)

(unfortunately, I haven't used Swift or Ruby before)

@unnamedd
Copy link
Collaborator

unnamedd commented Aug 6, 2024

@VladimirFokow to me, sounds very likely as a UTF-8 unicode problem.
Using the example you provided:

import pyperclip

text = pyperclip.paste()  # text is whatever is in the clipboard
pyperclip.copy(text)  # save text to clipboard

give a try in this piece of code and let us know if it will work for you

import pyperclip
import os
import chardet

# Ensure the environment uses UTF-8 encoding
os.environ['PYTHONIOENCODING'] = 'utf-8'

# Function to detect and convert encoding
def convert_to_utf8(text):
    result = chardet.detect(text.encode())
    encoding = result['encoding']
    return text.encode(encoding).decode('utf-8')

# Get text from clipboard
text = pyperclip.paste()

# Convert text to UTF-8
text_utf8 = convert_to_utf8(text)

# Copy text back to clipboard
pyperclip.copy(text_utf8)

print("Text successfully copied to clipboard.")

@VladimirFokow
Copy link
Author

VladimirFokow commented Aug 6, 2024

hi @unnamedd , thanks for the idea. But it didn't help..

For experimenting, here is an example "Script Command" which can invoke a python script:

#!/bin/bash

# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title test_cp_py
# @raycast.mode fullOutput

# Optional parameters:
# @raycast.description Test the clipboard (Python)
# @raycast.packageName test_cp_py
# @raycast.icon 🐍

# /path/to/python can be seen by calling: `which python3`
/path/to/python /path/to/script.py

added some prints (click)
import pyperclip
import os
import chardet

# Ensure the environment uses UTF-8 encoding
print("PYTHONIOENCODING: ", os.environ.get('PYTHONIOENCODING'))
os.environ['PYTHONIOENCODING'] = 'utf-8'
print("PYTHONIOENCODING: ", os.environ.get('PYTHONIOENCODING'))


# Function to detect and convert encoding
def convert_to_utf8(text):
    result = chardet.detect(text.encode())
    print('result, detected with chardet:', result)
    encoding = result['encoding']
    print('encoding:', encoding)
    return text.encode(encoding).decode('utf-8')



text = pyperclip.paste()
print(text)
text_utf8 = convert_to_utf8(text)
print(text_utf8)
pyperclip.copy(text_utf8)





print('\nencoding of "тест":', chardet.detect('тест'.encode()))
print('  just question marks:')
print('encoding of "????":', chardet.detect('тест'.encode()))
print('  symbols that the script produced to the clipboard:')
print('encoding of "????":', chardet.detect('????'.encode()))

Output:

PYTHONIOENCODING:  None
PYTHONIOENCODING:  utf-8
????
result, detected with chardet: {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
encoding: ascii
????

encoding of "тест": {'encoding': 'utf-8', 'confidence': 0.938125, 'language': ''}
  just question marks:
encoding of "????": {'encoding': 'utf-8', 'confidence': 0.938125, 'language': ''}
  symbols that the script produced to the clipboard:
encoding of "????": {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

Done in 0.17s

Looks like the encoding from clipboard doesn't survive the transition to the raycast process (the one which is spawned to executes the script).
How is this process created?

It could be beneficial to isolate the issue (create a minimal reproducible example of creating this process to see exactly where the encoding problem happens)

@VladimirFokow
Copy link
Author

Hi, could someone please point me at the code where the subprocess is created? Thanks!

quote:

we run the scripts as a subprocess

@dehesa
Copy link
Member

dehesa commented Sep 13, 2024

Hey @VladimirFokow, very sorry it took so long to get back to you.
The code spawning the process is part of the swift close-source app. We don't do anything special there. Let me copy the extract of how we set it up (it is in Swift):

let process = Process()
process.qualityOfService = qos.asQualityOfService
process.executableURL = URL(fileURLWithPath: command.scriptPath)

var environment = ProcessInfo.processInfo.environment
let defaultPath = "/usr/local/bin:/opt/homebrew/bin"
if let path = environment["PATH"] {
  environment["PATH"] = defaultPath + ":\(path)"
} else {
  environment["PATH"] = defaultPath
}
environment["LANG"] = "\(Locale.preferredIdentifier ?? Locale.autoupdatingCurrent.identifier).UTF-8"
let proxySettings = UserDefaults.standard.useSystemInternetProxySettings ? InternetProxySettings.fromSystem() : InternetProxySettings.fromEnvironment()
environment.merge(proxySettings.toEnvVars()) { $1 }
process.environment = environment
if !arguments.isEmpty {
  process.arguments = arguments
}

if let currentDirectoryPath = command.currentDirectoryPath {
  if currentDirectoryPath.hasPrefix("/") || currentDirectoryPath.hasPrefix("~") {
    process.currentDirectoryPath = currentDirectoryPath
  } else {
    let scriptDirURL = URL(fileURLWithPath: command.scriptPath).deletingLastPathComponent()
    process.currentDirectoryPath = scriptDirURL.appendingPathComponent(currentDirectoryPath).standardizedFileURL.path
  }
} else {
  process.currentDirectoryPath = URL(fileURLWithPath: command.scriptPath).deletingLastPathComponent().path
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants