Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update sankey algorithm to d3-sankey-v0.12.3 #4707

Merged
merged 10 commits into from
Aug 29, 2021
Merged

update sankey algorithm to d3-sankey-v0.12.3 #4707

merged 10 commits into from
Aug 29, 2021

Conversation

Yura52
Copy link
Contributor

@Yura52 Yura52 commented Nov 26, 2020

Closes #3548

@Yura52
Copy link
Contributor Author

Yura52 commented Nov 27, 2020

Not ready for review, needs some fixes.

@philippjfr
Copy link
Member

This is looking great! Can you describe the remaining issues? You mentioned that it doesn't actually seem to have an effect or did I misunderstand?

@Yura52
Copy link
Contributor Author

Yura52 commented Nov 30, 2020

Can you describe the remaining issues? You mentioned that it doesn't actually seem to have an effect or did I misunderstand?

The issue: my implementation does not work yet :) Need some time to find and fix bug(s).

Now, an unexpected part. I maintained a temporary flag that switches between the master and the new implementations:

improved = param.Boolean(default=False, doc="""Use the improved version.""")

When the job was "completed" I simply removed the flag and the master implementation and created the pr. The reason that made me think that my implementation works (and is equivalent to the master implementation) is that in JupyterLab after one enters an already computed plot variable as the last line of a cell, the computation happens for the second time. And for reasons I don't understand the flag somehow switches to the default value:

# sankey.py
...
if self.p.improved:
    print('improved')
    <improved implementation>
else:
    print('old')
    <old implementation>

# JupyterLab cell BEGIN
plot = hv.Sankey(..., improved=True)
plot
# JupyterLab cell END
# prints:
# improved
# old (what?)

@Yura52
Copy link
Contributor Author

Yura52 commented Feb 27, 2021

@philippjfr Hi! The new implementation is ready (sorry for such a long pause), see some screenshots at the end of this comment. It may be a good idea to remind users about the node_padding parameter, because the new implementation can yield "thinner" connections between nodes (see the last two examples below) and increasing node_padding can change that if needed.

The only thing I am not sure about is tests. I fixed them as well, so the following commands succeed when run locally:

flake8 holoviews
python -m unittest holoviews/tests/element/testgraphelement.py 
python -m unittest holoviews/tests/plotting/bokeh/testsankey.py

However, they fail in CI and from logs it looks like my old implementation is tested against new tests or my new implementation is tested against the old tests. Do you have any ideas, what could go wrong?

Examples

I took examples from here and here. Within every two pictures the first one is what current implementation does and the second one is the result of the new implementation.

UPD: although the first example is ok compared to the current implementation, it is obvious that it can be improved by moving the green and purple nodes to the upper part of the plot and swapping brown connections so that they don't intersect. Given that such a complex plot as in the gallery example is rendered "correctly", I don't think this is a bug of my implementation, but it can be a room for further improvements.

reference_before
reference_after
example_before
example_after
gallery_before
gallery_after

@jnettels
Copy link

jnettels commented Mar 9, 2021

@Yura52 Thanks for your answer.

My understanding was that the following is a result of your new implementation:
image

Since the vertical alignment had changed in comparison to the current implementation, I was wondering whether it was possible to optimize it in this PR. You already mentioned the possible improvements yourself: "moving the green and purple nodes to the upper part of the plot and swapping brown connections so that they don't intersect." I agree.
Stuff like this is what causes the most confusion when I show my Sankeys to people not familiar with them.

@Yura52
Copy link
Contributor Author

Yura52 commented Mar 9, 2021

@jnettels

My understanding was that the following is a result of your new implementation:

Yes, it is the result of the new implementation.

Since the vertical alignment had changed in comparison to the current implementation

Do you think this change is an improvement compared to the current implementation? If it is, we should probably start with this incremental change which is very clear of what it does (synchronizes the implementation with that of d3-sankey). After that we could move further and design some custom algorithm on top of the new implementation. However, if the new plots do not look better, please, share your feedback on that, it will also be very helpful.

@jnettels
Copy link

jnettels commented Mar 9, 2021

Alright, there you go :-)

  • Example 1 "PhD": Your update looks worse in my humble opinion. The green node sticking to the bottom is irritating. I'd prefer the current implementation, even though that is not ideal. I'd hope to see an improvement that minimizes the overlap and thus aligns the green node to the top
  • Example 2 "AB": Seems to be identical
  • Example 3 "Energy System": Seems like the difference may mainly be caused by different values for node_padding, so I am indifferent here

Is it straightforward for you to test the same examples with d3-sankey, to see if the implementations yield the same results? If we can establish that the results are basically the same, then any improvements (e.g. regarding overlap) should maybe first be addressed in d3-sankey (?)

@Yura52
Copy link
Contributor Author

Yura52 commented May 14, 2021

Alright, there you go :-)

Thanks a lot! Sorry for not getting back, still don't have time to finish the PR.

Is it straightforward for you to test the same examples with d3-sankey?

It turned out to be easy, because I was lucky to find this (I am not sure how to use that tool, I run cells one by one from bottom to top and it feels strange 😄 ): https://observablehq.com/@d3/sankey-diagram
This notebook already contains the "Energy" example!

You can past the following content to the cell where "data" is defined to test the "PhD" example:

data = {
  const links = [
    {source: "0", target: "1", value: 53.0},
    {source: "0", target: "2", value: 47.0},
    {source: "2", target: "6", value: 17.0},
    {source: "2", target: "3", value: 30.0},
    {source: "3", target: "1", value: 22.5},
    {source: "3", target: "4", value: 3.5},
    {source: "3", target: "6", value: 4.0},
    {source: "4", target: "5", value: 0.45},
  ];
  const nodes = Array.from(new Set(links.flatMap(l => [l.source, l.target])), name => ({name, category: name.replace(/ .*/, "")}));
  return {nodes, links, units: "TWh"};
}

The result looks good, so I think there is a bug somewhere in my implementation:
Screenshot 2021-05-14 at 22 41 54

@Yura52
Copy link
Contributor Author

Yura52 commented May 31, 2021

@philippjfr Seems like some progress happened: I fixed one bug and now we have a clear improvement for one of the reference examples (the second example in the following list). Below I attach four plots in the following order:

  1. The first example from the reference gallery
  2. The second example from the reference gallery
  3. The example from the gallery
  4. Same as 3, but with node_padding=7 (the new implementation may require some adjustments in terms of node_padding on the user side; the second option is to offer better default value)

As for 3 and 4, you can also compare the results with this example of what d3-sankey yields for the same input. I can see some difference in terms of links ordering, however, it is not trivial to compare because of the difference in colors.

Plots

reference_0

reference_1

gallery

gallaery_np_7

@jnettels
Copy link

jnettels commented Jun 2, 2021

I tested your latest changes on some of my sankeys and it is a great improvement! Many thanks to you!

@philippjfr
Copy link
Member

This is fabulous, thanks @Yura52!

@Yura52
Copy link
Contributor Author

Yura52 commented Jun 2, 2021

Thanks for the positive feedback! The current state is as follows:

  1. At this commit we have the implementation that got the positive feedback. However, it is slightly different from d3-sankey. It also yields unusually high y-values in test cases (I do not see related problems in the examples I try).
  2. I also added this commit (the latest at this point) that makes the implementation closer to d3-sankey.

It is easier for me to get better Energy plots with the first version. It makes me think there are still differences somewhere between my implementation and d3-sankey (or the Energy data is not a good test case). We can stop at this point and merge the version we like more. Another option is to continue searching for bugs (unfortunately, I cannot guarantee that (1) there are any (2) I will find them in the nearest future). What do you think?

@jnettels
Copy link

jnettels commented Jun 3, 2021

I did a comparison of a961ab6 and 75726bc and I cannot decide what I prefer.

Here are html pages for tests with both versions and the source data as xlsx:
https://gist.github.com/jnettels/6625497fe8f8444850674f8dd46a9b05
Sankey.xlsx

  • Example A: 75726bc is better
  • Example B: 75726bc is much better (feels more "dense", in a good way)
  • Example C: Only very slight differences, no preference
  • Example D: a961ab6 is a nice improvement over the current holoviews release. But 75726bc messed it up in weird ways, with lots of overlaps
  • Example E: Source data Is very similar to C and D, but here both a961ab6 and 75726bc are fine, again

So my example "D" is some kind of outlier here, which gets messed up with the latest changes. All in all, I still prefer the content of this PR over the release version.

@Yura52
Copy link
Contributor Author

Yura52 commented Aug 29, 2021

@philippjfr Hi! I have noticed that the PR is still opened. I have just reviewed all the examples provided by @jnettels (thanks!) and I agree that merging the PR makes sense. What do you think?

@philippjfr
Copy link
Member

Apologies @Yura52, I agree this is an improvement so I'll go ahead and merge now. Really appreciate your efforts and have to apologize this slipped through the cracks.

@philippjfr philippjfr merged commit ae02e4c into holoviz:master Aug 29, 2021
@Yura52
Copy link
Contributor Author

Yura52 commented Aug 29, 2021

@philippjfr no problem!

@atarashansky
Copy link

atarashansky commented Sep 7, 2021

This PR broke my sankey diagrams -- edges are all missing now:
bokeh_plot

The edgepaths are there:
bokeh_plot (2)

It's just that I set the line widths to zero to remove the outline. So the issue is that the node heights are messed up (nodes all have a height of zero).

@philippjfr
Copy link
Member

Can you open an issue?

Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Sankey algorithm to match latest d3-sankey
4 participants