-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support emoji for MTurk import / export #1773
Conversation
@YianZhang This is ready for review; please take a look. |
characters, e.g. 😀, and replaces each 4-byte character with an | ||
HTML span with the 4 bytes encoded as a JSON array, e.g.: | ||
|
||
<span class='emoji-bytes' data-emoji-bytes='[240, 159, 152, 128]'></span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use double quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not a way to represent the emoji directly as £
or something?
Actually, I have no idea why I didn't think of HTML entity escaping... it should work. I'll try it out. |
It works! I swapped to using HTML entities. Can't believe I didn't think about that. |
|
||
|
||
# Source: https://github.com/charman/mturk-emoji | ||
def replace_emoji_characters(s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
This function takes a Unicode string containing 4-byte Unicode | ||
characters, e.g. 😀, and replaces each 4-byte character with an | ||
HTML span with the 4 bytes encoded as a JSON array, e.g.: | ||
|
||
<span class='emoji-bytes' data-emoji-bytes='[240, 159, 152, 128]'></span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is out of date, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
s (Unicode string): | ||
Returns: | ||
Unicode string with all 4-byte Unicode characters in the source | ||
string replaced with HTML spans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Escapes emoji for Mechanical Turk exporting and importing because Mechanical Turk does not support unescaped emoji. (Error message: "Unsupported characters found")
Also updates the layout to improve whitespace formatting.