Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/bigquery): changes helper function to decode unicode scape sequences #10845

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,15 @@ def unquote_and_decode_unicode_escape_seq(
if string.startswith(leading_quote) and string.endswith(trailing_quote):
string = string[1:-1]

cleaned_string = string.encode().decode("unicode-escape")

return cleaned_string
# Decode Unicode escape sequences. This avoid issues with encoding
while string.find("\\u") >= 0:
index = string.find("\\u") # The first occurrence of the substring
unicode_seq = string[index : (index + 6)] # The Unicode escape sequence
PatrickfBraz marked this conversation as resolved.
Show resolved Hide resolved
# Replace the Unicode escape sequence with the decoded character
string = string.replace(
unicode_seq, unicode_seq.encode("utf-8").decode("unicode-escape")
)
return string
PatrickfBraz marked this conversation as resolved.
Show resolved Hide resolved


def parse_labels(labels_str: str) -> Dict[str, str]:
Expand Down
Loading