Doc cleanup 2 #411

jbednar · 2016-01-19T19:34:16Z

Ok, there are more doc cleanups ready to merge, after review. These are more extensive than the previous ones, and affect the test data, because cells have been reorganized, changed, deleted, and moved between tutorials. So the best way to review these changes is probably just to look at the affected notebooks (mainly Sampling Data and Columnar Data), see if they need changes locally, then roughly compare to the previous versions of those notebooks. Note that I moved the .table and .dframe section from Sampling Data into Columnar Data, where it seems to fit naturally, so that bit has to be compared between different tutorials. Or you can just look at the final result, and not worry about the previous version; up to you!

…g updated test data

jbednar · 2016-01-19T19:59:40Z

Remaining issues that I would like Philipp or Jean-Luc to address:

The test data should presumably be updated on this branch, before merging.
Elements.ipynb: The Bounds section says:
```
A bounds is a rectangular area specified as a tuple in (left, bottom,
right, top) format. It is useful for denoting a region of interest
defined by some bounds, whereas Box (below) is useful for drawing a
box at a specific location.

scene * hv.Bounds(0.2) * hv.Bounds((0.45, 0.45, 0.2, 0.2))
```
How could (0.45,0.45) be (left, bottom) if (0.2,0.2) is (right,top)?
This example should be change to actually use lbrt ordering if that's
what we recommend, or else the docs need to say what ordering is
actually supported. lbrt would be (0.2,0.2,0.45,0.45) for this plot, right?
Elements: I didn't change this, to avoid changing the reference
data for such a crucial document, but it seems to me that the initial
Points and Scatter examples ought to be identical, so that we can see
the similarities, and then later examples in the same section can
explicitly show how they differ. I made the text anticipate this
change already, but I think the actual notebook should also be updated
in this way.
Sampling Data: Should the tutorial be called Selecting Data? Right
now the word "sampling" is used within the tutorial to talk both about
the operation of taking samples, and about data that itself is sampled
from an underlying continuous distribution. Plus .select() appears to
be more general, encompassing slice/sample/index, so it would make
sense for the tutorial to be about that.
Sampling Data: Whether or not the tutorial is called Selecting
Data, it needs a full explanation of .select(). I've put in a few
words, but they may well be incorrect, because I totally don't know
how .select relates to slicing/indexing/sampling. Is .select() the
underlying implementation, leaving those other concepts to be nothing
more than convenient syntactic sugar? Is .select() more general than
those other concepts, supporting something that they do not? Is
.select() supported in different cases than the others? Are there any
things that slicing/indexing/sampling can do that .select() cannot?
When should I choose one of these options to get some particular bit
of data out of my data structure? Because I don't know the answer to
any of these questions, I couldn't write that part of the tutorial,
but it really needs to be there!

Sampling Data: There was previously a long introductory
section about 1D/2D/3D Element types, and after spending a lot of time
fixing it and cleaning it up, I finally decided that I didn't see how
it would be relevant to users. What I came up with is below, and if
it fits somewhere please do put it there, but if so please make sure
that it's clear what users should do with this information -- how
does it help them get what they want done? Since that was not clear,
I left it out for now. E.g. do users need to care whether the value
dimension is assumed to be a continuous manifold? (If so, Scatter and
ErrorBars should presumably group with Histogram and Bars, not Curves
and Images as in the previous version; for now I left them in their own
category in this sketch.)

## Supported operations

To understand how to do such selection, we first need to understand
what basic types of data are involved. HoloViews is built around the
philosophy that data should be stored in an immediately visualizable
form, and so all HoloViews data structures are either immediately
visualizable as a single 1D, 2D, or 3D plot (HoloViews
[``Element``s](Elements)), or are [``Container``s](Containers) of such
objects.  I.e., even if the data is highly multidimensional, it will
be in the form of some dimensions that can be viewed directly (as
HoloViews ``Element``s), potentially embedded in some larger space
that can only be viewed by laying out the ``Element``s spatially or
over time in an animation (see the [Showcase tutorial](Showcase)).
How to select the data of interest differs for various ``Element``
types because of the different types of data they hold, which we will
explain first:


### Discretely sampled continuous functions

**1D**: ``Curve``, ``Spread``
**2D**: ``Raster``, ``Image``, ``RGB``, ``HSV``, ``Surface``,
**``TriSurface``

Each of these Elements contain data that is interpreted as discrete
samples from an underlying continuous function, with one dependent
value for one or two independent variables.  For instance, a ``Curve``
object has datapoints that will be connected visually when plotted,
because it is expected to be a discrete approximation to a continuous
curve, and similarly for the 2D Element types here.  All of these
types except ``TriSurface`` support slicing, indexing, and sampling,
either because they are 1D (where such operations are always well
defined), or because they require datapoints to be aligned to a 2D
grid of independent values (which provides a clear interpretation of
such operations).  The 2D ``TriSurface`` Element does not require a
grid, supporting arbitrary independent value locations, but as a
consequence it will not support slicing, indexing, or sampling over
the key dimensions.


### Discretely sampled non-continuous functions

**1D**: ``Scatter``, ``ErrorBars``

These Elements contain 1D data with no assumption about how subsequent
data points should be connected, allowing arbitrary order of points.


### Binned or Categorical data:

**1D**: ``Histogram``, ``Bars``
**2D**: ``HeatMap``, ``QuadMesh``

These Elements represent bins or categorical data in a one or
two-dimensional space.

### Raw coordinates in continuous space:

**1D**: ``Distribution``
**2D**: ``Points``, ``Path``, ``Contours``, ``Polygons``
**3D**: ``Scatter3D``

These Elements are meant for data that has not been discretely sampled
or binned, instead representing arbitrary coordinates in a 1D, 2D, or
3D space, with no assumption of an underlying continuous space.

And finally the ``Table`` element, which supports n-dimensional data
of any kind, with no assumptions about what it represents.  Note that
even though the other above types are restricted to 1D, 2D, or 3D
data, they are used for n-dimensional data as well, by embedding them
in multidimensional [Container](Containers) types.  The restriction to
1D/2D/3D for each Element is merely an indication of the default way
in which the data will be displayed onscreen using a 2D display
device, not a limitation of the data itself.

Columnar Data: We need to include a source and/or link for
"we'll load a dataset of some macro-economic indicators for OECD
countries from 1964-1990", even if we use the copy that's on
holoviews.org.
Columnar Data: Regarding "we'll suppress the value labels
applied to a HeatMap by default and substitute it for a colorbar.",
there's no colorbar! If I enable one using "%opts HeatMap
[colorbar=True]" the layout is messed up, so we need to add more space
between the Layout items. Should this have been an NdLayout to line
it up in 1 column and reduce redundant axis labels?

Columnar Data: Regarding:

extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_coords = hv.Points(img.table(), extents=extents)
img + img * img_coords *
hv.Points([img.closest([(5.1,4.9)])])(style=dict(color='r')) + img.sample([(5.1,4.9)])

I expected to see a table like:

x    y     z
5    5     0.66

or

x    y     z
5.5  5.5   0.66

Instead I see:

x    y     z
5.1  4.9   0.66

I.e., I had expected sample to return the actual coordinates (snapped to
whatever the genuine values are), not the requested coordinates. Assuming
the current behavior is intentional, what are the pros and cons of this choice?

philippjfr · 2016-01-20T01:04:08Z

Thanks for going through this, looks like you put a lot of work into it. Should be able to review tomorrow evening.

jbednar · 2016-01-20T20:06:47Z

The Composing Data tutorial will probably break the tests also, because I added an example showing that floating-point access to the underlying data does work, unlike what the tutorial previously claimed. Just one cell, with text output, but it's probably still just as broken.

It would be very helpful if someone could look at anything that says "deep indexing" to see if I'm describing it accurately -- I thought we used that term to describe accessing all the way into .data, while the existing use of that term in this file was about accessing across multiple Containers using .select. Are they both deep indexing? If so, maybe we could clarify that; if not, maybe we could make a new name for indexing into an Element.

It would also be helpful if someone could check everything I said in the section where we look up the value at 5.2; I wasn't able to find an example of doing that that didn't work, but the previous text claimed there are some, so it would be good to be precise here about which Elements will and will not work for approximate floating-point index values.

philippjfr · 2016-01-20T20:15:46Z

It would be very helpful if someone could look at anything that says "deep indexing" to see if I'm describing it accurately -- I thought we used that term to describe accessing all the way into .data, while the existing use of that term in this file was about accessing across multiple Containers using .select. Are they both deep indexing?

Indexing into an Element to get a numeric value is usually not what we call deep indexing, it's usually reserved for using getitem or select to index into nested containers.

It would also be helpful if someone could check everything I said in the section where we look up the value at 5.2; I wasn't able to find an example of doing that that didn't work, but the previous text claimed there are some, so it would be good to be precise here about which Elements will and will not work for approximate floating-point index values.

Still traveling atm so I can't look in detail but I'll list the conditions where you won't get a single value.

When you do not supply indices for all the key dimensions.
If there are multiple value dimensions, it will only return a scalar if you select the dimension by name, e.g. when indexing a Scatter with vdims=['y', 'z'], scatter[5, 'y'] will return a scalar, while scatter[5] will not.
If there are multiple entries with the same key dimensions values, e.g. a Scatter Element with two points at x=5.

jbednar · 2016-01-20T20:20:10Z

Ok, I can remove the claim that indexing into .data is deep indexing; is there a name we should use for that instead?

For your other answer, I think you're answering a different question than the one I have. What I meant is that using [5.2] to get element 52 of the array in that example works fine, but using [5.2005] also works and gives the same result, as does [5.23]. I.e., the index does not need to be the precise floating point value of the key, as the tutorial previously claimed, because the indexing will just return the closest value that is defined. So, when will this work? I put in some words saying when I thought such approximate indexing would work, but please check that those words are correct and change them if necessary.

philippjfr · 2016-01-20T20:24:35Z

Ah right. That will work for all 1D (i.e. one kdim) elements.

jbednar · 2016-01-20T20:26:35Z

But not for Image and Surface, even though it's well defined in that case?

philippjfr · 2016-01-20T20:30:20Z

Ah sorry for those too, I think Image, Surface and QuadMesh are the only 2d types where it will snap.

jbednar · 2016-01-20T21:14:19Z

The Bokeh tutorial will also probably force test data to be regenerated, as I made some small but (to my mind :-) important changes.

jbednar · 2016-01-20T21:29:25Z

The Bokeh_Elements tutorial says:

The marker shape specified above can be any supported by [matplotlib](http://matplotlib.org/api/markers_api.html), e.g. ``s``, ``d``, or ``o``; the other options select the color and size of the marker.

Is this true even for bokeh? It does ok with the ones in the example, but is there a more definitive marker list for bokeh?

I added the sentence Almost all matplotlib Element types may be projected onto a polar axis by supplying projection='polar' as a plot option, but polar plots are not currently supported in Bokeh., because specifying projection=polar gives a warning and has no effect for the bokeh backend, but if there's actually some way to get polar plots in Bokeh (as some of their examples seem to suggest) we should say how to do it here.

jbednar · 2016-01-20T21:43:33Z

Pandas_Conversion currently includes a bunch of material that is now in the Columnar_Data tutorial, and the rest is generating warnings. I left everything untouched after "Conversion to complex HoloViews components", because it's clearly a duplication, and on the rest I just did a light-touch editing job in case that text is still useful. @philippjfr will need to look at what remains, deleting the duplication and reworking the remainder, or just deleting the whole tutorial if no longer relevant. Some of it does seem relevant, since it explains a bit about how the Pandas interaction works, or I would have just deleted it myself.

jbednar · 2016-01-20T22:13:13Z

The Pandas_Seaborn tutorial needs similar attention as the Pandas Conversion tutorial, i.e. to avoid deprecation warnings and make it focus on what we currently recommend that users do, if not DFrame. Also, Out[3] had an exception both when I run it and on the public website, which is replaced with a warning when removing the "label=" options, so I left it as-is for someone else to correct. In[6] also has an error if the "[joint=True]" option is specified, which again needs addressing.

philippjfr · 2016-01-20T22:41:52Z

The Bokeh tutorial will also probably force test data to be regenerated, as I made some small but (to my mind :-) important changes.

Bokeh tutorials are currently untested as we somehow have to sanitize the json to ignore various ids.

Re: Markers. Is this true even for bokeh? It does ok with the ones in the example, but is there a more definitive marker list for bokeh?

By default bokeh supports none of the short marker identifiers but I did implement a mpl -> bokeh compatibility function, which among other things translates mpl markers to their bokeh equivalent. We'll have to decide whether to keep this or not.

I added the sentence Almost all matplotlib Element types may be projected onto a polar axis by supplying projection='polar' as a plot option, but polar plots are not currently supported in Bokeh.

I'll have to look into bokeh support for polar projections, if so could you point me to the examples?

Pandas_Conversion currently includes a bunch of material that is now in the Columnar_Data tutorial, and the rest is generating warnings. I left everything untouched after "Conversion to complex HoloViews components", because it's clearly a duplication, and on the rest I just did a light-touch editing job in case that text is still useful.

I'll have a look at what can be saved and move it into other Tutorials as it's probably not worth it keeping this Tutorial around.

philippjfr · 2016-01-20T22:55:47Z

extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_coords = hv.Points(img.table(), extents=extents)
img + img * img_coords *
hv.Points([img.closest([(5.1,4.9)])])(style=dict(color='r')) + img.sample([(5.1,4.9)])

Yes, this seems to reveal two bugs, first I didn't account for snapping in 1D in sample, secondly this shouldn't work at all because 2D Points shouldn't be snapping even when only a 1D index is supplied. This example should be using Scatter instead.

jbednar · 2016-01-21T03:40:05Z

I think that the helper function to translate markers to bokeh seems useful, so I'd keep it unless it's an egregious hack (which it doesn't sound like).

The polar histogram example plot I saw in bokeh was at http://bokeh.pydata.org/en/latest/docs/gallery/burtin.html, but looking more closely at it, I don't see any usable polar-histogram implementation there; it's just a very low-level example of drawing wedges in a circle. So if someone needs polar histograms using the bokeh backend, we'd have to submit something to bokeh to implement them properly first.

What's the long-term plan for the Elements Tutorial, wrt backends? I've already made various edits to the matplotlib version that should presumably be mirrored into the Bokeh version, plus the Bokeh version has a few changes relative to the matplotlib version (plus apparently some missing Elements). Should we set up some automatic mechanism for generating a version specific to each backend, by applying patches to one master version? Presumably the master version would be easiest to maintain if it's an actual runnable notebook, i.e. the matplotlib version, perhaps with some markdown sections for code only for other versions, which would then be enabled during patching?

jbednar · 2016-01-21T03:46:57Z

BTW, is this statement from Elements really true?

Such a plot wouldn't be meaningful for Scatter, but is a valid use for Points, where the x and y locations are independent variables representing coordinates, and the "data" is conveyed by the size and color of the dots

Seems like a Scatter could easily be plotting two or three value dimensions, e.g. if someone measured the elevation and size of something distributed at regular intervals along a horizontal length, leading to a Scatter plot with height on y and size as dot size.

jbednar · 2016-01-21T06:22:22Z

The functionality of the TableConversion class may be conveniently accessed using the .to property, which should have its own tutorial someday, but hopefully this will get the idea across:

Is it still true that this is undocumented?

Internally, Path Element types hold a list of Nx2 arrays, specifying the x/y-coordinates along each path.

Is that still true? If not please update. E.g. what else can they be internally?

Also, I added an example of a non-gridded VectorField, as requested by someone (can't find this issue anymore!), which of course will break the tests for Elements. Feel free to improve that example, e.g. to make it more colorful or to show lines instead of arrows.

jbednar · 2016-01-26T21:22:49Z

Should index.rst be updated to say we support Python 3.5 now? If so we need a 3.5 test on Travis. Then again, it says we support Python 3.3, and there's only a 3.4 test on Travis, so I don't know how we can know if 3.3 support is true...

philippjfr · 2016-01-27T14:23:57Z

Since we want the documentation to be updated asap, I'll be updating the reference_data now and merge this PR. Any further fixes can be made in future PRs.

jbednar · 2016-01-27T14:26:07Z

Great, thanks! We need to make sure to visually check the generated output for the files whose reference data changed, to make sure we don't miss some subtle change in behavior.

philippjfr · 2016-01-28T19:41:51Z

I've gone through all the reference_data and it all looks fine so I've updated it and will merge now. I'll reopen this PR shortly and go through your other suggestions. Buildbot will push the updated website to dev.holoviews.org in a few minutes and I'll update the main website to match.

Doc cleanup 2

jbednar added 3 commits January 19, 2016 13:20

Fixed typos and clarified docs

874a97a

Fully reorganized Sampling Data and Columnar Data tutorials, requirin…

d5ad05f

…g updated test data

Fully reorganized Sampling Data and Columnar Data tutorials, requirin…

120dd6a

…g updated test data

Updated for 1.4

6fa0bd6

jbednar added 3 commits January 20, 2016 09:38

Updated Exporting tutorial for 1.4

fe71a5b

Updated Continuous tutorial for 1.4

f66e10b

Updated Composing Data for 1.4

07ae94b

jbednar added 2 commits January 20, 2016 14:37

Clarified deep indexing

bbcdcc9

Updated Bokeh tutorial for 1.4

70f7035

Updated Bokeh Elements tutorial for 1.4

f1cbb9b

Partially updated Pandas_Conversion for 1.4

5107ce4

Partially updated Pandas_Seaborn for 1.4

8c0d36d

Updated Elements for 1.4

35afeb1

philippjfr and others added 14 commits January 26, 2016 01:38

Fixes to Elements and Containers index markup

90ef679

Updated doc/builder submodule reference

c30761e

Fixed markup and links on Examples page

de41dba

Added link to new Example in docs

b713bfb

Fixed indentation in Examples

73af0fc

Fixed links in Columnar Data Tutorial

f4109dd

Fixed link in doc index.rst

3ecf635

Merge branch 'master' into doc-cleanup

54fdf7c

Updated doc/builder submodule reference

2b76580

Fixed markup issue in homepage index.rst

12f6723

Added colorbars to example in Columnar_Data tutorial

90cdb50

Fixed markup in install.rst

adf9f37

Removed Transforming_Data Tutorial

6d24e6e

Minor tweaks to homepage

c7710bf

Removed outdated material from Pandas tutorial

becd1dc

philippjfr force-pushed the doc-cleanup branch from ce4f2c1 to becd1dc Compare January 27, 2016 14:25

philippjfr added 7 commits January 27, 2016 14:57

Updated reference_data after doc cleanup

6bc1675

Fixed display of pandas HTML in notebook (for testing)

2d8ce20

Fixed minor bugs in Pandas_Conversion

5cf5a29

Fixed python3 bug in Columnar Data

c09efaa

Updated reference_data submodule reference

2ca2be0

Setting facecolor explicitly in Columnar_Data to avoid build errors

a0813f9

Updated reference_data submodule reference

46513ee

philippjfr added a commit that referenced this pull request Jan 28, 2016

Merge pull request #411 from ioam/doc-cleanup

ef42361

Doc cleanup 2

philippjfr merged commit ef42361 into master Jan 28, 2016

jlstevens deleted the doc-cleanup branch February 4, 2016 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc cleanup 2 #411

Doc cleanup 2 #411

jbednar commented Jan 19, 2016

jbednar commented Jan 19, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

jbednar commented Jan 20, 2016

jbednar commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 21, 2016

jbednar commented Jan 21, 2016

jbednar commented Jan 21, 2016

jbednar commented Jan 26, 2016

philippjfr commented Jan 27, 2016

jbednar commented Jan 27, 2016

philippjfr commented Jan 28, 2016

Doc cleanup 2 #411

Doc cleanup 2 #411

Conversation

jbednar commented Jan 19, 2016

jbednar commented Jan 19, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 20, 2016

jbednar commented Jan 20, 2016

jbednar commented Jan 20, 2016

jbednar commented Jan 20, 2016

philippjfr commented Jan 20, 2016

philippjfr commented Jan 20, 2016

jbednar commented Jan 21, 2016

jbednar commented Jan 21, 2016

jbednar commented Jan 21, 2016

jbednar commented Jan 26, 2016

philippjfr commented Jan 27, 2016

jbednar commented Jan 27, 2016

philippjfr commented Jan 28, 2016