update sankey algorithm to d3-sankey-v0.12.3 #4707

Yura52 · 2020-11-26T20:54:08Z

Yura52 · 2020-11-27T10:25:13Z

Not ready for review, needs some fixes.

philippjfr · 2020-11-30T15:31:34Z

This is looking great! Can you describe the remaining issues? You mentioned that it doesn't actually seem to have an effect or did I misunderstand?

Yura52 · 2020-11-30T19:04:01Z

Can you describe the remaining issues? You mentioned that it doesn't actually seem to have an effect or did I misunderstand?

The issue: my implementation does not work yet :) Need some time to find and fix bug(s).

Now, an unexpected part. I maintained a temporary flag that switches between the master and the new implementations:

improved = param.Boolean(default=False, doc="""Use the improved version.""")

When the job was "completed" I simply removed the flag and the master implementation and created the pr. The reason that made me think that my implementation works (and is equivalent to the master implementation) is that in JupyterLab after one enters an already computed plot variable as the last line of a cell, the computation happens for the second time. And for reasons I don't understand the flag somehow switches to the default value:

# sankey.py
...
if self.p.improved:
    print('improved')
    <improved implementation>
else:
    print('old')
    <old implementation>

# JupyterLab cell BEGIN
plot = hv.Sankey(..., improved=True)
plot
# JupyterLab cell END
# prints:
# improved
# old (what?)

Yura52 · 2021-02-27T14:29:58Z

@philippjfr Hi! The new implementation is ready (sorry for such a long pause), see some screenshots at the end of this comment. It may be a good idea to remind users about the node_padding parameter, because the new implementation can yield "thinner" connections between nodes (see the last two examples below) and increasing node_padding can change that if needed.

The only thing I am not sure about is tests. I fixed them as well, so the following commands succeed when run locally:

flake8 holoviews
python -m unittest holoviews/tests/element/testgraphelement.py 
python -m unittest holoviews/tests/plotting/bokeh/testsankey.py

However, they fail in CI and from logs it looks like my old implementation is tested against new tests or my new implementation is tested against the old tests. Do you have any ideas, what could go wrong?

Examples

I took examples from here and here. Within every two pictures the first one is what current implementation does and the second one is the result of the new implementation.

UPD: although the first example is ok compared to the current implementation, it is obvious that it can be improved by moving the green and purple nodes to the upper part of the plot and swapping brown connections so that they don't intersect. Given that such a complex plot as in the gallery example is rendered "correctly", I don't think this is a bug of my implementation, but it can be a room for further improvements.

jnettels · 2021-03-09T12:03:51Z

@Yura52 Thanks for your answer.

My understanding was that the following is a result of your new implementation:

Since the vertical alignment had changed in comparison to the current implementation, I was wondering whether it was possible to optimize it in this PR. You already mentioned the possible improvements yourself: "moving the green and purple nodes to the upper part of the plot and swapping brown connections so that they don't intersect." I agree.
Stuff like this is what causes the most confusion when I show my Sankeys to people not familiar with them.

Yura52 · 2021-03-09T12:12:55Z

@jnettels

My understanding was that the following is a result of your new implementation:

Yes, it is the result of the new implementation.

Since the vertical alignment had changed in comparison to the current implementation

Do you think this change is an improvement compared to the current implementation? If it is, we should probably start with this incremental change which is very clear of what it does (synchronizes the implementation with that of d3-sankey). After that we could move further and design some custom algorithm on top of the new implementation. However, if the new plots do not look better, please, share your feedback on that, it will also be very helpful.

jnettels · 2021-03-09T16:51:52Z

Alright, there you go :-)

Example 1 "PhD": Your update looks worse in my humble opinion. The green node sticking to the bottom is irritating. I'd prefer the current implementation, even though that is not ideal. I'd hope to see an improvement that minimizes the overlap and thus aligns the green node to the top
Example 2 "AB": Seems to be identical
Example 3 "Energy System": Seems like the difference may mainly be caused by different values for node_padding, so I am indifferent here

Is it straightforward for you to test the same examples with d3-sankey, to see if the implementations yield the same results? If we can establish that the results are basically the same, then any improvements (e.g. regarding overlap) should maybe first be addressed in d3-sankey (?)

Yura52 · 2021-05-14T19:45:37Z

Alright, there you go :-)

Thanks a lot! Sorry for not getting back, still don't have time to finish the PR.

Is it straightforward for you to test the same examples with d3-sankey?

It turned out to be easy, because I was lucky to find this (I am not sure how to use that tool, I run cells one by one from bottom to top and it feels strange 😄 ): https://observablehq.com/@d3/sankey-diagram
This notebook already contains the "Energy" example!

You can past the following content to the cell where "data" is defined to test the "PhD" example:

data = {
  const links = [
    {source: "0", target: "1", value: 53.0},
    {source: "0", target: "2", value: 47.0},
    {source: "2", target: "6", value: 17.0},
    {source: "2", target: "3", value: 30.0},
    {source: "3", target: "1", value: 22.5},
    {source: "3", target: "4", value: 3.5},
    {source: "3", target: "6", value: 4.0},
    {source: "4", target: "5", value: 0.45},
  ];
  const nodes = Array.from(new Set(links.flatMap(l => [l.source, l.target])), name => ({name, category: name.replace(/ .*/, "")}));
  return {nodes, links, units: "TWh"};
}

The result looks good, so I think there is a bug somewhere in my implementation:

Yura52 · 2021-05-31T22:04:43Z

@philippjfr Seems like some progress happened: I fixed one bug and now we have a clear improvement for one of the reference examples (the second example in the following list). Below I attach four plots in the following order:

The first example from the reference gallery
The second example from the reference gallery
The example from the gallery
Same as 3, but with node_padding=7 (the new implementation may require some adjustments in terms of node_padding on the user side; the second option is to offer better default value)

As for 3 and 4, you can also compare the results with this example of what d3-sankey yields for the same input. I can see some difference in terms of links ordering, however, it is not trivial to compare because of the difference in colors.

Plots

jnettels · 2021-06-02T16:46:43Z

I tested your latest changes on some of my sankeys and it is a great improvement! Many thanks to you!

philippjfr · 2021-06-02T20:23:04Z

This is fabulous, thanks @Yura52!

Yura52 · 2021-06-02T22:49:33Z

Thanks for the positive feedback! The current state is as follows:

At this commit we have the implementation that got the positive feedback. However, it is slightly different from d3-sankey. It also yields unusually high y-values in test cases (I do not see related problems in the examples I try).
I also added this commit (the latest at this point) that makes the implementation closer to d3-sankey.

It is easier for me to get better Energy plots with the first version. It makes me think there are still differences somewhere between my implementation and d3-sankey (or the Energy data is not a good test case). We can stop at this point and merge the version we like more. Another option is to continue searching for bugs (unfortunately, I cannot guarantee that (1) there are any (2) I will find them in the nearest future). What do you think?

jnettels · 2021-06-03T08:29:52Z

I did a comparison of a961ab6 and 75726bc and I cannot decide what I prefer.

Here are html pages for tests with both versions and the source data as xlsx:
https://gist.github.com/jnettels/6625497fe8f8444850674f8dd46a9b05
Sankey.xlsx

Example A: 75726bc is better
Example B: 75726bc is much better (feels more "dense", in a good way)
Example C: Only very slight differences, no preference
Example D: a961ab6 is a nice improvement over the current holoviews release. But 75726bc messed it up in weird ways, with lots of overlaps
Example E: Source data Is very similar to C and D, but here both a961ab6 and 75726bc are fine, again

So my example "D" is some kind of outlier here, which gets messed up with the latest changes. All in all, I still prefer the content of this PR over the release version.

Yura52 · 2021-08-29T14:46:49Z

@philippjfr Hi! I have noticed that the PR is still opened. I have just reviewed all the examples provided by @jnettels (thanks!) and I agree that merging the PR makes sense. What do you think?

philippjfr · 2021-08-29T15:30:56Z

Apologies @Yura52, I agree this is an improvement so I'll go ahead and merge now. Really appreciate your efforts and have to apologize this slipped through the cracks.

Yura52 · 2021-08-29T16:26:22Z

@philippjfr no problem!

atarashansky · 2021-09-07T18:24:39Z

This PR broke my sankey diagrams -- edges are all missing now:

The edgepaths are there:

It's just that I set the line widths to zero to remove the outline. So the issue is that the node heights are messed up (nodes all have a height of zero).

philippjfr · 2021-09-07T18:58:36Z

Can you open an issue?

github-actions · 2024-10-24T07:28:51Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Yura52 added 3 commits February 27, 2021 15:02

update sankey algorithm to d3-sankey-v0.12.3

e0d62ea

fix bugs and adjust tests

c935b2b

make flake8 happy on python 2.7

e355795

Yura52 mentioned this pull request Mar 9, 2021

Update Sankey algorithm to match latest d3-sankey #3548

Closed

Yura52 added 4 commits June 1, 2021 00:06

fix THE bug

1817708

merge upstream/master

122440f

convert the while loop back to the for loop

51f65b4

fix tests

a961ab6

Yura52 added 3 commits June 3, 2021 00:41

explain the difference between this implementation and d3-sankey

5d2020a

change the comment

a15d663

make the implementation closer to d3-sankey

75726bc

philippjfr merged commit ae02e4c into holoviz:master Aug 29, 2021

atarashansky mentioned this pull request Sep 7, 2021

Sankey edges disappear for large datasets #5079

Closed

github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update sankey algorithm to d3-sankey-v0.12.3 #4707

update sankey algorithm to d3-sankey-v0.12.3 #4707

Yura52 commented Nov 26, 2020

Yura52 commented Nov 27, 2020

philippjfr commented Nov 30, 2020

Yura52 commented Nov 30, 2020 •

edited

Loading

Yura52 commented Feb 27, 2021 •

edited

Loading

jnettels commented Mar 9, 2021

Yura52 commented Mar 9, 2021 •

edited

Loading

jnettels commented Mar 9, 2021

Yura52 commented May 14, 2021 •

edited

Loading

Yura52 commented May 31, 2021 •

edited

Loading

jnettels commented Jun 2, 2021

philippjfr commented Jun 2, 2021

Yura52 commented Jun 2, 2021 •

edited

Loading

jnettels commented Jun 3, 2021

Yura52 commented Aug 29, 2021

philippjfr commented Aug 29, 2021

Yura52 commented Aug 29, 2021

atarashansky commented Sep 7, 2021 •

edited

Loading

philippjfr commented Sep 7, 2021

github-actions bot commented Oct 24, 2024

update sankey algorithm to d3-sankey-v0.12.3 #4707

update sankey algorithm to d3-sankey-v0.12.3 #4707

Conversation

Yura52 commented Nov 26, 2020

Yura52 commented Nov 27, 2020

philippjfr commented Nov 30, 2020

Yura52 commented Nov 30, 2020 • edited Loading

Yura52 commented Feb 27, 2021 • edited Loading

Examples

jnettels commented Mar 9, 2021

Yura52 commented Mar 9, 2021 • edited Loading

jnettels commented Mar 9, 2021

Yura52 commented May 14, 2021 • edited Loading

Yura52 commented May 31, 2021 • edited Loading

jnettels commented Jun 2, 2021

philippjfr commented Jun 2, 2021

Yura52 commented Jun 2, 2021 • edited Loading

jnettels commented Jun 3, 2021

Yura52 commented Aug 29, 2021

philippjfr commented Aug 29, 2021

Yura52 commented Aug 29, 2021

atarashansky commented Sep 7, 2021 • edited Loading

philippjfr commented Sep 7, 2021

github-actions bot commented Oct 24, 2024

Yura52 commented Nov 30, 2020 •

edited

Loading

Yura52 commented Feb 27, 2021 •

edited

Loading

Yura52 commented Mar 9, 2021 •

edited

Loading

Yura52 commented May 14, 2021 •

edited

Loading

Yura52 commented May 31, 2021 •

edited

Loading

Yura52 commented Jun 2, 2021 •

edited

Loading

atarashansky commented Sep 7, 2021 •

edited

Loading