Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding displayHTML to allow HTML output to the notebook #122

Merged
merged 10 commits into from
Dec 13, 2022

Conversation

janpfeifer
Copy link
Contributor

@janpfeifer janpfeifer commented Oct 17, 2022

hi,

I was needing this for a project I'm working on, so I thought I would put a PR together.

Included changes:

  • Added support for HTML output, using same trick of pointing to a temporary file in tmp/. Added matching 'displayHTML' bash function.
  • Only delete file if it's actually in tmp: this way one can show an already existing image without deleting it.
  • Refactored images.py to display.py and made it generic, so in principle it can support various rich content types. For now only 'images' and 'html'.
  • Export ''NOTEBOOK_BASH_KERNEL_CAPABILITIES" with the set of supported capabilities, so programs can conditioned their output to that (and they can know they are running inside a notebook+bash_kernel).
  • Added documentation to README.rst and in the code.
  • Add support for lines terminated in '\r', so progress-bar style programs can work. Something like:
  ((ii=0))
  while ((ii<5)) ; do
    printf  "${ii}\r"
    sleep 1
    ((ii+1))
  done

With this PR, this will display one number at a time, updating every second, as opposed to just display "4" after 5 seconds.

cheers
Jan

@janpfeifer janpfeifer mentioned this pull request Oct 22, 2022
@kdm9
Copy link
Collaborator

kdm9 commented Oct 23, 2022

On a quick first pass, this looks great! Thanks heaps. I'll review it over the next week and request some minor changes.

@janpfeifer
Copy link
Contributor Author

janpfeifer commented Oct 23, 2022

Many thanks @kdm9 . Since you are looking over it next week, I added a couple of commits with another couple added features:

  • Added support for javascript;
  • Added support for update_display_data (as opposed to the usual display_data) which allows dynamic update of content (HTML, images, Javascript).

A quick example of it in action:

    display_id="id_${RANDOM}"
    ((ii=0))
    while ((ii < 10)) ; do
        echo "<div>${ii}</div>" | displayHTML $display_id
        ((ii = ii+1))
        sleep 1
    done

In my application I'm using it to let an ML program (non-python) drive the creation of a plot (with Chart.js, but could have been D3 or any other), as results come in. It's looking neat :)

@janpfeifer
Copy link
Contributor Author

Ping ?

Not urgent, but it would be nice to have it included :)

I've been using it a lot with interactive programs (progressbars) and programs with fancy plotting output. It's been really nice.

@ngirard
Copy link

ngirard commented Nov 30, 2022

I'd be much interested as well ;-)

@kdm9
Copy link
Collaborator

kdm9 commented Dec 1, 2022

Sorry, I'm travelling for work at the moment so won't have a chance to attend to this for another couple of weeks. In broad terms it looks good to me though, so I'd anticipate it being merged this year. Sorry for the delay. Perhaps @takluyver has comments in the mean time?

@janpfeifer
Copy link
Contributor Author

No probs. Take care on your trip, we can move forward with it when you are back.

@ngirard
Copy link

ngirard commented Dec 2, 2022

Exactly ! Thanks for the heads up @kdm9 and take your time.
If you happen to be visiting Paris, France, just let me know: I'll be glad to buy you a coffee ;-)

@janpfeifer
Copy link
Contributor Author

@takluyver
Copy link
Owner

No particular comments from me, but I like the idea, and I think the implementation looks OK at a quick glance. 🙂

@janpfeifer
Copy link
Contributor Author

Cool, let me know if I should do anything.

Copy link
Collaborator

@kdm9 kdm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A few minor comments, one concern, and one bit that needs more thought.

The concern: how safe is the javascript functionality here? I imagine that this is something of a security risk, but then so are most other similar things jupyter does. I'm not qualified to comment further, so if @takluyver or anyone watching has concerns I'd like to hear them.

And the open discussion: can we make this code less pathological in cases of very frequent updates? I'm thinking of the very many programs that spam '\r'-terminated updates at a terminal assuming that they will always overwrite each other. With the code as it is now, I think such programs will quickly fill the receiving notebook with progress updates, leading to a massive .ipynb and much browser load. I think the correct response is limiting updates to e.g. one per second, and overwriting the previous line if it ends with '\r', mimicking the perceived behaviour in the terminal to the maximal extent.

bash_kernel/kernel.py Show resolved Hide resolved
bash_kernel/kernel.py Outdated Show resolved Hide resolved
matched = True
break
if not matched:
output_lines.append(line)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps something here like:

if line.endswith("\r") and len(output_lines) > 0:
    output_lines[-1] = line
else:
    output_lines.append(line)

to cause '\r'-terminated lines to overwrite the previous line, to match the perceived behaviour in the terminal (and prevent over-zealous progress updates clogging up the notebook)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think we'd need to check if the previous line ends with '\r', not the current one, so:

if len(output_lines) > 0 and output_lines[-1].endswith("\r"):
    output_lines[-1] = line
else:
    output_lines.append(line)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may actually be pointless, as for this to work we'd need to see all output at once, but actually the line we'd update would already have been sent to the kernel. Perhaps we should still do this, and buffer messages for something like a second, limiting progress updates to one line per second at worst. thoughts @takluyver?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does e.g. tqdm.notebook.tqdm() handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great catch, but there is another issue with this approach:

At least normal terminals \r doesn't erase the previous line (I actually would love if it did, but it doesn't ...), and some folks may rely on that behavior, only overwriting the start of the line for instance (and leaving something static on the right margin)

E.g.: Type this in a terminal:

$ ((ii = 0)); printf '\t\t<- Counter\r' ; while ((ii < 5)) ; do printf '  %s\r' $ii; ((ii=ii+1)) ; sleep 1 ; done ; echo

Wdyt ?

I'll leave this as unresolved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one could do sth like

if len(output_lines) > 0 and output_lines[-1].endswith("\r"):
    last = output_lines[-1]
    if len(last) > len(line):
        line = line[:-1] + last[len(line)-1:-1] + line[-1]  # double check this, I might have a fencepost error here
    output_lines[-1] = line
else:
    output_lines.append(line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could ... but is it worth it ?

I mean, then there would be two different places managing logic on how to overwrite text after a CR (carriage-return), and they activate in an apparently non-deterministic way : if we don't do anything, all the logic will be one place, within Jupyter notebook's rendering.

From our side it's not a big deal, and we can definitely implement it ... but we may be creating subtle (and hard to debug) trouble for someone in the future trying to change how CR are handled.

Any thoughts ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you're probably right. I've just been burned by my previous solution to this exact problem, which is to | tr '\r' '\n' any progress-y outputs, which inevitably crashes my browser when some program updates a bit too frequently. Perhaps we should raise this with upstream and have it fixed once and for all.

bash_kernel/display.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@kdm9 kdm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janpfeifer thanks for the changes, and for the PR. I'm happy to merge, but I'll leave it open another day in case @takluyver or any other interested parties have anything to say.

@janpfeifer
Copy link
Contributor Author

janpfeifer commented Dec 12, 2022

Btw, on Javascript security: definitely this opens up the user's browser to run any javascript a program in the host (running the notebook kernel) may want. In principle this doesn't sound more or less dangerous than opening any page in the web -- it will be able to run any javascript it wants.

There may be some considerations about a program in the host being able to craft a javascript that will trigger the run of another cell, in the same host ... but then the program could itself another program in the same host, since it's already running there ...

It doesn't ring any extra alarms to me -- Notebooks are extremely risky by nature, anyone with a connection to the notebook can run anything in that host :\ ...

On the '\r' overloading issue: I see what you mean ... I mean, if someone would do a cat /dev/random, it will overload the .ipynb anyway, but it will be visible ... while the '\r' could be invisible. But I feel a solution to this should be upstream, where these things are rendered. Notice bash_kernel only sees a fraction of the output at a time, and dispatches it to the kernel. At most I suppose we could merge a few lines per batch, the problem would still be there -- I'm guessing here, maybe if someone outputs with no pause, bash_kernel could see a larger amount of lines at a time.

@kdm9
Copy link
Collaborator

kdm9 commented Dec 12, 2022

Btw, on Javascript security: definitely this opens up the user's browser to run any javascript a program in the host (running the notebook kernel) may want. In principle this doesn't sound more or less dangerous than opening any page in the web -- it will be able to run any javascript it wants.

There may be some considerations about a program in the host being able to craft a javascript that will trigger the run of another cell, in the same host ... but then the program could itself another program in the same host, since it's already running there ...

It doesn't ring any extra alarms to me -- Notebooks are extremely risky by nature, anyone with a connection to the notebook can run anything in that host :\ ...

The exact conclusion I reached. Let's not worry too hard about remote code execution while writing our program to execute code remotely :)

On the '\r' overloading issue: I see what you mean ... I mean, if someone would do a cat /dev/random, it will overload the .ipynb anyway, but it will be visible ... while the '\r' could be invisible. But I feel a solution to this should be upstream, where these things are rendered. Notice bash_kernel only sees a fraction of the output at a time, and dispatches it to the kernel. At most I suppose we could merge a few lines per batch, the problem would still be there -- I'm guessing here, maybe if someone outputs with no pause, bash_kernel could see a larger amount of lines at a time.

Yeah, I agree, though in many cases there's no way (outside &>/dev/null) to prevent some code from updating with insane frequency, whereas one can at least try not to regularly cat /dev/random :)

But let's talk to the folks upstream their thoughts on the \r-spamming issue rather than coming up with something janky here (or even ask @takluyver since he is more involved in core jupyter stuff than I).

@takluyver
Copy link
Owner

On the Javascript security side: the one thing that Jupyter tries to avoid is that when you open a notebook, it shouldn't be able to start running anything (including Javascript, since the Javascript can send code to the kernel), and it shouldn't be able to disguise what will run when you press shift-enter. This is why if you download a notebook it will initially open as 'untrusted' with rich outputs hidden.

This doesn't require anything on the kernel side, though - it's all handled by the notebook server and frontend.

Likewise, carriage return characters (\r) should be handled by the frontend, because as you note, the kernel may have already sent on the text that it wants to overwrite. I don't really know my way around the new Jupyterlab code, but in the old notebook code you can see the relevant bits of code here:

https://github.com/jupyter/notebook/blob/e946154112daa5ef997d7f549953cce91b336d3d/notebook/static/notebook/js/outputarea.js#L535-L570

https://github.com/jupyter/notebook/blob/e946154112daa5ef997d7f549953cce91b336d3d/notebook/static/base/js/utils.js#L477-L504

@kdm9 kdm9 merged commit c5b6b30 into takluyver:master Dec 13, 2022
@kdm9
Copy link
Collaborator

kdm9 commented Dec 13, 2022

Thanks @takluyver. I've merged, given we all agree the remaining problems cant be solved here. @janpfeifer thanks again for the PR, and apologies it took me a while to review.

@kdm9
Copy link
Collaborator

kdm9 commented Dec 13, 2022

And I'll make a release later today after some more comprehensive testing on my real work

@ngirard
Copy link

ngirard commented Feb 20, 2023

Thanks @kdm9 and @janpfeifer for your work !

@janpfeifer, could you please explain how in your example notebook the output from your shell cells are displayed, since I don't see anything beyond the command themselves, while I expected to see e.g. displayHTML

@janpfeifer
Copy link
Contributor Author

I'm happy you enjoyed it @ngirard .

So the fundamental way to display HTML content is in displayHTML, it's just that because I'm coding a rather large project in Go, I would do the same as displayHTML but in the compiled Go program.

You can see the displayHTML bash function here to see how it's done -- it generically defines a display<type> functions. I added some documentation here, it can be coded in any language.

My Go code is not open-sourced yet sadly -- I plan to do that in the next weeks, and it will be visible.

Now, while bash_kernel works great, because I was doing purely Go, in the end I wrote a Jupyter kernel for Go called GoNB-- if that is your language, check it out.

@ngirard
Copy link

ngirard commented Feb 24, 2023

Understood, thank you very much for your clarifications @janpfeifer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants