Plotting large numbers of sequences/time series together: dealing with fixed length numpy arrays #512

narendramukherjee · 2017-10-26T02:45:27Z

Added a function in ds.utils to convert time series/sequences stored as 2D numpy arrays to a dataframe with NaN separators between individual sequences. Also added an example in tseries.ipynb showing the use of this function. This is in response to issue #286

…mpy arrays to a pandas dataframe with NaNs separating individual sequences 2.) An example in tseries.ipynb showing the use of this function while plotting thousands of sequences together

narendramukherjee · 2017-10-26T15:43:44Z

@jbednar @philippjfr Can you take a look and let me know what you think.

jbednar · 2017-10-26T18:03:07Z

Thanks for the PR!

In most of our use cases for datashading curves, we want to be able to distinguish between the curves, which is only feasible for up to a few dozen curves if we use count_cat to colorize them. Here, there doesn't seem to be a way to convey the identity of each curve, but it seems like you have an application in mind where that doesn't matter? E.g. maybe you could talk about how this approach lets you discover underlying periodicities in pseudorandom number generators? That's what it looks like your example is showing:

jbednar · 2017-10-26T18:37:04Z

Oops; the periodicities are just due to having many fewer items in your sequence than the number of points in the plot, which again is unusual. The bumps are just for each number involved:

Still trying to think of a way to motivate what an example like this will be used for...

narendramukherjee · 2017-10-26T18:38:34Z

My specific use case involves looking at 1-2ms long voltage traces from neurons (action potentials) and determining if they are coming from the same neuron. Each neuron produces stereotypical action potentials, and plotting all the action potentials recorded on a electrode on top of each other let's us know if they are coming from one or multiple neurons. So, yes, in my case, the exact identity of each curve doesn't really matter. Check out our Scipy paper again for severely overplotted examples of this kind: http://conference.scipy.org/proceedings/scipy2017/narendra_mukherjee.html

I think that this sort of use case isn't that uncommon - the original question in #286 was trying to achieve exactly this sort of thing. I just used a pseudorandom number generator as an easy way to generate 'dummy' data of the kind I am plotting - I could as well put in my specific use case, with action potentials from a neuron as an example, but that would mean I would have to put in some actual data that I have recorded as well to make those plots work. I didn't know how to do that with a IPython notebook.

Let me know what you think!

jbednar · 2017-10-26T19:13:52Z

I had forgotten that you were the one with the SciPy paper, which I do remember now!

A use case something like that was what I was imagining, but in that case, won't you want to know the identity of the inappropriately sorted curves, the ones with shapes that suggest that they are not action potentials for this neuron, so that you can exclude them from the group? I agree that a visualization like this is a good first step, to at least be able to see them, but then if it were my data I'd immediately want to start pulling out the outlier curves and see why they ended up in this bucket inappropriately, which is difficult if I can't identify them.

Maybe in practice what you do is just adjust some threshold, never dealing with individual curves by name or id? In that case I guess a good visualization would be to overlay a datashaded plot of the traces included by the threshold in one color, over a datashaded plot of the ones excluded in another color, adjusting the threshold until those two groups were quite visibly distinct. Doing that shouldn't require anything further from datashader, but it sure seems like it would be helpful to have an example that shows a workflow like this.

I wonder if there's a good way to do that with synthetic data, synthesizing a bunch of curves from different categories, pooling them all together, and then showing how to use datashader to see visually that there are these categories and then adjusting thresholds until a clustering algorithm correctly sorts out each category. Hmm; probably too ambitious, so I guess I should just merge this utility as-is and think about that later!

jbednar · 2017-10-26T20:39:52Z

Ok, I tidied up the example notebook a bit to remove extraneous changes and to use an example where each datapoint was countable for clarity, and merged it. Thanks for your contribution!

…h fixed length numpy arrays (#512) * Added a function to ds.utils to convert sequences stored as 2D numpy arrays to a pandas dataframe with NaNs separating individual sequences * Added an example in tseries.ipynb showing the use of this function while plotting thousands of sequences together

narendramukherjee added 2 commits October 25, 2017 22:39

Added 1.) A function to ds.utils to convert sequences stored as 2D nu…

9de79b3

…mpy arrays to a pandas dataframe with NaNs separating individual sequences 2.) An example in tseries.ipynb showing the use of this function while plotting thousands of sequences together

Put unexecuted version of tseries.ipynb

4c86f3c

philippjfr mentioned this pull request Oct 26, 2017

Using categorical coloring for separate aggregates #513

Open

Tidied up new example in tseries.ipynb

eadbc95

jbednar merged commit 0bacd83 into holoviz:master Oct 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting large numbers of sequences/time series together: dealing with fixed length numpy arrays #512

Plotting large numbers of sequences/time series together: dealing with fixed length numpy arrays #512

narendramukherjee commented Oct 26, 2017 •

edited

Loading

narendramukherjee commented Oct 26, 2017

jbednar commented Oct 26, 2017

jbednar commented Oct 26, 2017 •

edited

Loading

narendramukherjee commented Oct 26, 2017

jbednar commented Oct 26, 2017

jbednar commented Oct 26, 2017

Plotting large numbers of sequences/time series together: dealing with fixed length numpy arrays #512

Plotting large numbers of sequences/time series together: dealing with fixed length numpy arrays #512

Conversation

narendramukherjee commented Oct 26, 2017 • edited Loading

narendramukherjee commented Oct 26, 2017

jbednar commented Oct 26, 2017

jbednar commented Oct 26, 2017 • edited Loading

narendramukherjee commented Oct 26, 2017

jbednar commented Oct 26, 2017

jbednar commented Oct 26, 2017

narendramukherjee commented Oct 26, 2017 •

edited

Loading

jbednar commented Oct 26, 2017 •

edited

Loading