Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe cursor_pos ambiguity and bump protocol to 5.2 #262

Merged
merged 1 commit into from
Jun 21, 2017

Conversation

minrk
Copy link
Member

@minrk minrk commented May 23, 2017

cursor_pos with characters after 0x10000 is officially ambiguous with respect to surrogate pairs in versions 5.1 and earlier due to a widespread bug in javascript frontends.

Thus bumps the protocol to 5.2 with the sole change that cursor_pos is no longer ambiguous (i.e. clients that correctly implement cursor_pos in 5.1 can bump to 5.2 with no other changes).

cf #259

cross-refs for frontend implementations:

cc @stevengj @rgbkrk @Carreau @blink1073 @williamstein

Many frontends, especially those implemented in javascript,
reported cursor_pos as the interpreter's string index,
which is not the same as the unicode character offset if the interpreter uses UTF-16 (e.g. javascript or Python 2 on macOS),
which stores astral-plane characters such as ``𝐚`` as surrogate pairs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precise U+1D41A ? Because I'm going to assume somewhere it's not going to render correctly ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

- JupyterLab
- nteract
- CoCalc
- Jupyter Console and QtConsole with Python 2 on macOS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we precise versions ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably on Windows, too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed on Windows Python 2 as well.

.. versionchanged:: 5.2

Due to a widespread bug in many frontends, ``cursor_pos``
in versions prior to 5.2 is ambiguous in the presence of astral-plane.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a noun:

in the presence of "astral-plane" characters

(added scare quotes around "astral-plane" since this is a slang term)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar below

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also link the github issue for the technical details?

@stevengj
Copy link

In kernel_info_reply the kernel sends what version of the protocol it implements. Is the highest supported protocol version sent in the reverse direction (from frontend to kernel) anywhere?

@minrk
Copy link
Member Author

minrk commented May 25, 2017

Is the highest supported protocol version sent in the reverse direction (from frontend to kernel) anywhere?

This ought to be the 'version' field in the request header, I believe.

@blink1073
Copy link
Contributor

Currently 5.0 in JupyterLab and Notebook 5.0.

@stevengj
Copy link

stevengj commented May 25, 2017

@minrk, currently kernel_info_request is documented as content = {}. Adding a version field would be a good addition. Can that be added to protocol version 5.2 as well? Oh, never mind, this is already in the message header.

@minrk
Copy link
Member Author

minrk commented Jun 8, 2017

Came back to this. I believe I've addressed comments now, and issued PRs against jupyter/notebook and jupyterlab to handle the transform and bump the version to 5.2.

@minrk minrk force-pushed the 52-unicode branch 2 times, most recently from f0081e6 to 7379294 Compare June 8, 2017 13:52
taking up two indices instead of one, causing a unicode offset
drift of one per astral-plane character.
Not all frontends have this behavior, however,
and after JSON serialization information about encoding was used
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about which encoding was used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@stevengj
Copy link

stevengj commented Jun 8, 2017

cross-ref: nteract/hydrogen#807

stevengj added a commit to JuliaLang/IJulia.jl that referenced this pull request Jun 8, 2017
cursor_pos with characters after 0x10000 is ambiguous with respect to surrogate pairs
in versions 5.1 and earlier due to widespread bugs in javascript frontends
@minrk minrk merged commit 51f20a6 into jupyter:master Jun 21, 2017
@minrk minrk deleted the 52-unicode branch June 21, 2017 07:26
@minrk minrk modified the milestone: 5.1 Jun 22, 2017
@ophie200 ophie200 mentioned this pull request Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants