Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell ID JEP Implementation #189

Merged
merged 2 commits into from
Dec 1, 2020
Merged

Cell ID JEP Implementation #189

merged 2 commits into from
Dec 1, 2020

Conversation

MSeal
Copy link
Contributor

@MSeal MSeal commented Aug 27, 2020

Introduces 4.5 format with cell ids added following https://github.com/jupyter/enhancement-proposals/blob/master/62-cell-id/cell-id.md. Cell ids are unique, and restricted in the character set is allowed.

@MSeal MSeal changed the title [WIP for JEP proposal review] Initial pass at adding proposed cell ids Cell ID JEP Implementation Oct 2, 2020
@MSeal
Copy link
Contributor Author

MSeal commented Oct 2, 2020

Tested in lab and classic against:

Is there a good way to avoid some of the issues such as #167? Currently in JupyterLab 2.1.2 one can open a notebook with format 4.5, add cells, then save resulting in a 4.5 notebook without cell_ids.
successfully.

However when loading and then saving a fully valid 4.5 notebook from classic or lab, the id field is being stripped out between the nbformat.read and nbformat.write calls. The auto-fill for notebook cell ids addressing the above concern is repopulating the document on validation with new ids, so it doesn't throw any errors, but instead replaces all ID every save. I tested using nbformat's read and then write on their own and the ids are preserved. A look at the code paths in notebook server didn't reveal to me why this is being stripped out. @jasongrout or @minrk would you maybe know what might be causing that in the server side of the equation so we can patch that behavior?

@MSeal
Copy link
Contributor Author

MSeal commented Oct 2, 2020

@willingc Do you have permissions to the appveyor for nbformat? I don't seem to have permissions disable that build which was replaced by github actions in a separate PR.

@willingc
Copy link
Member

willingc commented Oct 2, 2020

@vidartf Do you know how to completely remove appveyor on this project? Deleted the webhook but didn't seem to have any effect.

@MSeal Can you push again to this PR?

@vidartf
Copy link
Contributor

vidartf commented Oct 3, 2020

@willingc I should have removed the project from appveyor now I think 👍

@willingc
Copy link
Member

willingc commented Oct 3, 2020

Thanks @vidartf 👍

@MSeal MSeal requested review from minrk and takluyver October 6, 2020 21:22
@MSeal
Copy link
Contributor Author

MSeal commented Oct 6, 2020

@afshin Any chance you have a little time / knowledge about the cell node reset I describe above with this change?

@afshin
Copy link
Member

afshin commented Oct 7, 2020

Hi @MSeal! I'm sorry, I am not very familiar with what is happening here. I wonder if there is something in the TS typings of the front-end that does the ID stripping. Or have you already isolated it and established it's the server that's doing it?

@MSeal
Copy link
Contributor Author

MSeal commented Oct 7, 2020

@afshin I haven't narrowed it down, except that it happens in both lab and classic, and since classic is an older setup it made me suspicious that it was a server issue and not a frontend one. I could check this by seeing what payload reaches the UI over the network... that'd probably narrow it down. I tried digging through server code to see what might be causing it but I couldn't find any obvious culprit.

@MSeal
Copy link
Contributor Author

MSeal commented Oct 7, 2020

Maybe @jasongrout might know here? I'll check if the cell id reaches the front-end contents payload later today when I get a minute to do so

@rgbkrk
Copy link
Member

rgbkrk commented Oct 15, 2020

This isn't anything to do with additionalProperties being set to false while the allowed fields didn't include the ID is it? There could be a different version of the nbformat schema used at validation time causing the disappearance issue.

@MSeal
Copy link
Contributor Author

MSeal commented Oct 15, 2020

Oh that does look like it is the source of the problem. I'm having difficulty getting a local installation to use my local npm package for nbformat to prove the described issue resolves once that's available. I don't actually even see a reference to nbformat-schema in the jupyter/notebook package info to point to my local installation. @rgbkrk do you know how to get that injected for classic? I'll take a look at lab next to see if I can try it there.

@MSeal
Copy link
Contributor Author

MSeal commented Oct 15, 2020

Hmm I don't seem to know these front-ends enough to successfully patch them. It does look like lab has it's own clone of nbformat baked into the project but modifying that to add cell id alone doesn't appear to be sufficient to persist ids as they get read.

I don't think the front-ends ignoring id fields and having them be auto-replaced is a blocker to merging / releasing nbformat given the current behavior. I'm going to be unavailable for a while starting tomorrow, so if folks could help take over merging / releasing this PR once there's some thumbs up from anyone who maintains lab and classic I'd appreciate it.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 15, 2020

There might be more than one spot to hit this within notebook.js and cell.js --

The nbformat_minor is set to 1 in notebook.js:
https://github.com/jupyter/notebook/blob/2ba296039aa70afbfd90630fea14bb8461b40410/notebook/static/notebook/js/notebook.js#L145, so once we're ready to do the (almost) full thing in classic, then we'll want to bump this.

However... The main issue appears to be the toJSON and fromJSON functions on the Cell classes (and subclasses). Note the default behavior on toJSON:

https://github.com/jupyter/notebook/blob/1550410229e302ecf74d5bf82d5caba6e6296a64/notebook/static/notebook/js/cell.js#L489

We need to do something like data.id = this.id which would mean we'd also be reading it in the fromJSON call: https://github.com/jupyter/notebook/blob/1550410229e302ecf74d5bf82d5caba6e6296a64/notebook/static/notebook/js/cell.js#L510

You can tell it's working from the server because it doesn't give that big warning banner about the notebook validation (which does happen server side).

@willingc
Copy link
Member

@jasongrout @afshin @rgbkrk Any updates on lab and classic? It would be great to get this merged and we can wait on cutting the next nbformat release if needed for lab or classic. FYI/ @MSeal

@willingc willingc mentioned this pull request Nov 21, 2020
6 tasks
@blois
Copy link

blois commented Nov 25, 2020

I was looking into supporting this for Colab and am wondering what the order of operations is- are editors implementing and releasing support for the 4.5 schema before it is officially merged into the spec or should implementations wait for this to be merged in?

@teucer
Copy link

teucer commented Nov 27, 2020

Planning also to use this functionality for cell level comments. Any timelines, please?

@MSeal
Copy link
Contributor Author

MSeal commented Nov 28, 2020

@blois We've adopted the new format ahead of time for Noteable. I was hoping lab maintainers would put a say on this before I went on paternity leave in case they wanted to get ahead. I am thinking of just merging and releasing at this point as it doesn't break anything in lab/classic, just replaces ids periodically because those front-ends have their own management of cell structures outside of nbformat in the UI layers. @willingc @rgbkrk opinions on just merging and getting this out of pending status?

@willingc
Copy link
Member

@MSeal I would say go ahead and merge with and then release as in #199.

@goanpeca
Copy link
Contributor

Hi @willingc, fixed the merge conflicts. Should be good now :)

@blois
Copy link

blois commented Dec 2, 2020

For reference I was planning on supporting this in two phases- first would support cell IDs if the notebook was version 4.5 but not create new notebooks with version 4.5. Once most tools performing strict nbformat validation have updated to support 4.5 then we can start generating these files.

@jasongrout
Copy link
Member

FYI, I think we can work on this for our JupyterLab 3.1 release.

I think the plan @blois puts forth makes sense. Since a 4.5 format file does not validate as a 4.4 file (if I understand correctly), we'll need time for people to upgrade their nbformat package and tools to support the format before we can start generating them by default.

@rgbkrk
Copy link
Member

rgbkrk commented Dec 15, 2020

Thank you for moving this forward!

@MSeal
Copy link
Contributor Author

MSeal commented Dec 17, 2020

@rgbkrk Should I kick off the npm and pypi release this weekend? I could also do a beta release first so folks could use it without auto-upgrading while they prepare releases.

@goanpeca
Copy link
Contributor

Hi @MSeal, nice to have you back 👶🏽 :)

So @rgbkrk provided me access to PyPI and npm, so I really wanted to have #195 in to test it to make the releases.

Would you like to meet over the weekend and go over the process together? otherwise I would be happy to merge and release a Beta to try things out.

@MSeal
Copy link
Contributor Author

MSeal commented Dec 18, 2020

Yea, let's meet this weekend @goanpeca. Afternoons US/Pacific time zone is best for me. Shoot me an email or ping me on the nteract slack and we can pick a time

@goanpeca
Copy link
Contributor

I do not think I have access to the nteract slack 🙃?

Afternoons US/Pacific time zone is best for me

Sounds goo, will send an invite.

@goanpeca
Copy link
Contributor

goanpeca commented Dec 18, 2020

Just sent an invite, @MSeal does that work?

@MSeal
Copy link
Contributor Author

MSeal commented Dec 18, 2020

Can we do two hours earlier or on Saturday?

The nteract slack is public fyi: https://nteract.slack.com/

@willingc
Copy link
Member

Thanks for picking this up @MSeal

@MSeal
Copy link
Contributor Author

MSeal commented Dec 20, 2020

@blois @jasongrout @rgbkrk FYI there's a beta0 release now with these changes for npm and pypi. @goanpeca got the bump2version all working today to support beta and prod releases automatically with tags / github actions and we used to it today to do the beta release.

I was planning to let the beta sit until early Jan and then do the full release as 5.0.9 without any beta indicators if that sounds reasonable to folks.

@jasongrout
Copy link
Member

Just curious, is there a specific reason the cell ids were not added to unrecognized cells as top-level fields, along with cell_type and metadata? I could see it both ways, but it would seem nice and consistent for tooling if all cells had ids, not just the three official types.

@willingc
Copy link
Member

@jasongrout At the point where @MSeal and I wrote the JEP, we wrote it with the 3 official types specified as a cell in mind. We explicitly focused on the 3 official types since those would be the ones most commonly referenced.

@MSeal
Copy link
Contributor Author

MSeal commented Mar 30, 2021

Echoing @willingc. Additional I think it'd be reasonable to add it to unidentified cells... however that definition is kind of off, because I believe the spec doesn't actually allow for any cells to be anything but the 3 officials types via:

"cell": {
"type": "object",
"oneOf": [
{"$ref": "#/definitions/raw_cell"},
{"$ref": "#/definitions/markdown_cell"},
{"$ref": "#/definitions/code_cell"}
]
},
. Nothing references the unrecognized_cell definition so it's effectively a no-op in the spec.

@jmtayloruk
Copy link

Apologies if this is not the right place to flag this up (a brief search didn't find a more suitable place to raise this, and it's an observation that applies directly to the Cell ID implementation).

I think it would be worth sanitising the dictionary that is used to generate these cell IDs. One notebook I happened to peruse contained "transsexual-death" and "brown-israeli". An auto-generated system with scope for this sort of output just seems to be asking for trouble. I presume there must be methods out there for avoiding potentially controversial combinations? e.g. if whatthreewords had problems like this, I imagine we would have heard about it by now. Theirs are often entertaining, but never seem controversial.

@MSeal
Copy link
Contributor Author

MSeal commented Mar 31, 2021

@jmtayloruk This issue was raised in #216 -- we're looking at how to prune the word list to avoid those issues. If you have input or suggestions after reading the thread we'd be happy to hear. At a minimum we're going to prune the list more aggressively in the near future.

@jmtayloruk
Copy link

jmtayloruk commented Mar 31, 2021

Thanks @MSeal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants