Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc cleanup 2 #411

Merged
merged 42 commits into from
Jan 28, 2016
Merged

Doc cleanup 2 #411

merged 42 commits into from
Jan 28, 2016

Conversation

jbednar
Copy link
Member

@jbednar jbednar commented Jan 19, 2016

Ok, there are more doc cleanups ready to merge, after review. These are more extensive than the previous ones, and affect the test data, because cells have been reorganized, changed, deleted, and moved between tutorials. So the best way to review these changes is probably just to look at the affected notebooks (mainly Sampling Data and Columnar Data), see if they need changes locally, then roughly compare to the previous versions of those notebooks. Note that I moved the .table and .dframe section from Sampling Data into Columnar Data, where it seems to fit naturally, so that bit has to be compared between different tutorials. Or you can just look at the final result, and not worry about the previous version; up to you!

@jbednar
Copy link
Member Author

jbednar commented Jan 19, 2016

Remaining issues that I would like Philipp or Jean-Luc to address:

  1. The test data should presumably be updated on this branch, before merging.

  2. Elements.ipynb: The Bounds section says:

    A bounds is a rectangular area specified as a tuple in (left, bottom,
    right, top) format. It is useful for denoting a region of interest
    defined by some bounds, whereas Box (below) is useful for drawing a
    box at a specific location.
    
    scene * hv.Bounds(0.2) * hv.Bounds((0.45, 0.45, 0.2, 0.2))
    

    How could (0.45,0.45) be (left, bottom) if (0.2,0.2) is (right,top)?
    This example should be change to actually use lbrt ordering if that's
    what we recommend, or else the docs need to say what ordering is
    actually supported. lbrt would be (0.2,0.2,0.45,0.45) for this plot, right?

  3. Elements: I didn't change this, to avoid changing the reference
    data for such a crucial document, but it seems to me that the initial
    Points and Scatter examples ought to be identical, so that we can see
    the similarities, and then later examples in the same section can
    explicitly show how they differ. I made the text anticipate this
    change already, but I think the actual notebook should also be updated
    in this way.

  4. Sampling Data: Should the tutorial be called Selecting Data? Right
    now the word "sampling" is used within the tutorial to talk both about
    the operation of taking samples, and about data that itself is sampled
    from an underlying continuous distribution. Plus .select() appears to
    be more general, encompassing slice/sample/index, so it would make
    sense for the tutorial to be about that.

  5. Sampling Data: Whether or not the tutorial is called Selecting
    Data, it needs a full explanation of .select(). I've put in a few
    words, but they may well be incorrect, because I totally don't know
    how .select relates to slicing/indexing/sampling. Is .select() the
    underlying implementation, leaving those other concepts to be nothing
    more than convenient syntactic sugar? Is .select() more general than
    those other concepts, supporting something that they do not? Is
    .select() supported in different cases than the others? Are there any
    things that slicing/indexing/sampling can do that .select() cannot?
    When should I choose one of these options to get some particular bit
    of data out of my data structure? Because I don't know the answer to
    any of these questions, I couldn't write that part of the tutorial,
    but it really needs to be there!

  6. Sampling Data: There was previously a long introductory
    section about 1D/2D/3D Element types, and after spending a lot of time
    fixing it and cleaning it up, I finally decided that I didn't see how
    it would be relevant to users. What I came up with is below, and if
    it fits somewhere please do put it there, but if so please make sure
    that it's clear what users should do with this information -- how
    does it help them get what they want done? Since that was not clear,
    I left it out for now. E.g. do users need to care whether the value
    dimension is assumed to be a continuous manifold? (If so, Scatter and
    ErrorBars should presumably group with Histogram and Bars, not Curves
    and Images as in the previous version; for now I left them in their own
    category in this sketch.)

    ## Supported operations
    
    To understand how to do such selection, we first need to understand
    what basic types of data are involved. HoloViews is built around the
    philosophy that data should be stored in an immediately visualizable
    form, and so all HoloViews data structures are either immediately
    visualizable as a single 1D, 2D, or 3D plot (HoloViews
    [``Element``s](Elements)), or are [``Container``s](Containers) of such
    objects.  I.e., even if the data is highly multidimensional, it will
    be in the form of some dimensions that can be viewed directly (as
    HoloViews ``Element``s), potentially embedded in some larger space
    that can only be viewed by laying out the ``Element``s spatially or
    over time in an animation (see the [Showcase tutorial](Showcase)).
    How to select the data of interest differs for various ``Element``
    types because of the different types of data they hold, which we will
    explain first:
    
    
    ### Discretely sampled continuous functions
    
    **1D**: ``Curve``, ``Spread``
    **2D**: ``Raster``, ``Image``, ``RGB``, ``HSV``, ``Surface``,
    **``TriSurface``
    
    Each of these Elements contain data that is interpreted as discrete
    samples from an underlying continuous function, with one dependent
    value for one or two independent variables.  For instance, a ``Curve``
    object has datapoints that will be connected visually when plotted,
    because it is expected to be a discrete approximation to a continuous
    curve, and similarly for the 2D Element types here.  All of these
    types except ``TriSurface`` support slicing, indexing, and sampling,
    either because they are 1D (where such operations are always well
    defined), or because they require datapoints to be aligned to a 2D
    grid of independent values (which provides a clear interpretation of
    such operations).  The 2D ``TriSurface`` Element does not require a
    grid, supporting arbitrary independent value locations, but as a
    consequence it will not support slicing, indexing, or sampling over
    the key dimensions.
    
    
    ### Discretely sampled non-continuous functions
    
    **1D**: ``Scatter``, ``ErrorBars``
    
    These Elements contain 1D data with no assumption about how subsequent
    data points should be connected, allowing arbitrary order of points.
    
    
    ### Binned or Categorical data:
    
    **1D**: ``Histogram``, ``Bars``
    **2D**: ``HeatMap``, ``QuadMesh``
    
    These Elements represent bins or categorical data in a one or
    two-dimensional space.
    
    ### Raw coordinates in continuous space:
    
    **1D**: ``Distribution``
    **2D**: ``Points``, ``Path``, ``Contours``, ``Polygons``
    **3D**: ``Scatter3D``
    
    These Elements are meant for data that has not been discretely sampled
    or binned, instead representing arbitrary coordinates in a 1D, 2D, or
    3D space, with no assumption of an underlying continuous space.
    
    And finally the ``Table`` element, which supports n-dimensional data
    of any kind, with no assumptions about what it represents.  Note that
    even though the other above types are restricted to 1D, 2D, or 3D
    data, they are used for n-dimensional data as well, by embedding them
    in multidimensional [Container](Containers) types.  The restriction to
    1D/2D/3D for each Element is merely an indication of the default way
    in which the data will be displayed onscreen using a 2D display
    device, not a limitation of the data itself.
    
  7. Columnar Data: We need to include a source and/or link for
    "we'll load a dataset of some macro-economic indicators for OECD
    countries from 1964-1990", even if we use the copy that's on
    holoviews.org.

  8. Columnar Data: Regarding "we'll suppress the value labels
    applied to a HeatMap by default and substitute it for a colorbar.",
    there's no colorbar! If I enable one using "%opts HeatMap
    [colorbar=True]" the layout is messed up, so we need to add more space
    between the Layout items. Should this have been an NdLayout to line
    it up in 1 column and reduce redundant axis labels?

  9. Columnar Data: Regarding:

    extents = (0, 0, 10, 10)
    img = hv.Image(np.random.rand(10, 10), bounds=extents)
    img_coords = hv.Points(img.table(), extents=extents)
    img + img * img_coords *
    hv.Points([img.closest([(5.1,4.9)])])(style=dict(color='r')) + img.sample([(5.1,4.9)])
    

    I expected to see a table like:

    x    y     z
    5    5     0.66
    

    or

    x    y     z
    5.5  5.5   0.66
    

    Instead I see:

    x    y     z
    5.1  4.9   0.66
    

    I.e., I had expected sample to return the actual coordinates (snapped to
    whatever the genuine values are), not the requested coordinates. Assuming
    the current behavior is intentional, what are the pros and cons of this choice?

@philippjfr
Copy link
Member

Thanks for going through this, looks like you put a lot of work into it. Should be able to review tomorrow evening.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

The Composing Data tutorial will probably break the tests also, because I added an example showing that floating-point access to the underlying data does work, unlike what the tutorial previously claimed. Just one cell, with text output, but it's probably still just as broken.

It would be very helpful if someone could look at anything that says "deep indexing" to see if I'm describing it accurately -- I thought we used that term to describe accessing all the way into .data, while the existing use of that term in this file was about accessing across multiple Containers using .select. Are they both deep indexing? If so, maybe we could clarify that; if not, maybe we could make a new name for indexing into an Element.

It would also be helpful if someone could check everything I said in the section where we look up the value at 5.2; I wasn't able to find an example of doing that that didn't work, but the previous text claimed there are some, so it would be good to be precise here about which Elements will and will not work for approximate floating-point index values.

@philippjfr
Copy link
Member

It would be very helpful if someone could look at anything that says "deep indexing" to see if I'm describing it accurately -- I thought we used that term to describe accessing all the way into .data, while the existing use of that term in this file was about accessing across multiple Containers using .select. Are they both deep indexing?

Indexing into an Element to get a numeric value is usually not what we call deep indexing, it's usually reserved for using getitem or select to index into nested containers.

It would also be helpful if someone could check everything I said in the section where we look up the value at 5.2; I wasn't able to find an example of doing that that didn't work, but the previous text claimed there are some, so it would be good to be precise here about which Elements will and will not work for approximate floating-point index values.

Still traveling atm so I can't look in detail but I'll list the conditions where you won't get a single value.

  1. When you do not supply indices for all the key dimensions.

  2. If there are multiple value dimensions, it will only return a scalar if you select the dimension by name, e.g. when indexing a Scatter with vdims=['y', 'z'], scatter[5, 'y'] will return a scalar, while scatter[5] will not.

  3. If there are multiple entries with the same key dimensions values, e.g. a Scatter Element with two points at x=5.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

Ok, I can remove the claim that indexing into .data is deep indexing; is there a name we should use for that instead?

For your other answer, I think you're answering a different question than the one I have. What I meant is that using [5.2] to get element 52 of the array in that example works fine, but using [5.2005] also works and gives the same result, as does [5.23]. I.e., the index does not need to be the precise floating point value of the key, as the tutorial previously claimed, because the indexing will just return the closest value that is defined. So, when will this work? I put in some words saying when I thought such approximate indexing would work, but please check that those words are correct and change them if necessary.

@philippjfr
Copy link
Member

Ah right. That will work for all 1D (i.e. one kdim) elements.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

But not for Image and Surface, even though it's well defined in that case?

@philippjfr
Copy link
Member

Ah sorry for those too, I think Image, Surface and QuadMesh are the only 2d types where it will snap.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

The Bokeh tutorial will also probably force test data to be regenerated, as I made some small but (to my mind :-) important changes.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

The Bokeh_Elements tutorial says:

The marker shape specified above can be any supported by [matplotlib](http://matplotlib.org/api/markers_api.html), e.g. ``s``, ``d``, or ``o``; the other options select the color and size of the marker.

Is this true even for bokeh? It does ok with the ones in the example, but is there a more definitive marker list for bokeh?

I added the sentence Almost all matplotlib Element types may be projected onto a polar axis by supplying projection='polar' as a plot option, but polar plots are not currently supported in Bokeh., because specifying projection=polar gives a warning and has no effect for the bokeh backend, but if there's actually some way to get polar plots in Bokeh (as some of their examples seem to suggest) we should say how to do it here.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

Pandas_Conversion currently includes a bunch of material that is now in the Columnar_Data tutorial, and the rest is generating warnings. I left everything untouched after "Conversion to complex HoloViews components", because it's clearly a duplication, and on the rest I just did a light-touch editing job in case that text is still useful. @philippjfr will need to look at what remains, deleting the duplication and reworking the remainder, or just deleting the whole tutorial if no longer relevant. Some of it does seem relevant, since it explains a bit about how the Pandas interaction works, or I would have just deleted it myself.

@jbednar
Copy link
Member Author

jbednar commented Jan 20, 2016

The Pandas_Seaborn tutorial needs similar attention as the Pandas Conversion tutorial, i.e. to avoid deprecation warnings and make it focus on what we currently recommend that users do, if not DFrame. Also, Out[3] had an exception both when I run it and on the public website, which is replaced with a warning when removing the "label=" options, so I left it as-is for someone else to correct. In[6] also has an error if the "[joint=True]" option is specified, which again needs addressing.

@philippjfr
Copy link
Member

The Bokeh tutorial will also probably force test data to be regenerated, as I made some small but (to my mind :-) important changes.

Bokeh tutorials are currently untested as we somehow have to sanitize the json to ignore various ids.

Re: Markers. Is this true even for bokeh? It does ok with the ones in the example, but is there a more definitive marker list for bokeh?

By default bokeh supports none of the short marker identifiers but I did implement a mpl -> bokeh compatibility function, which among other things translates mpl markers to their bokeh equivalent. We'll have to decide whether to keep this or not.

I added the sentence Almost all matplotlib Element types may be projected onto a polar axis by supplying projection='polar' as a plot option, but polar plots are not currently supported in Bokeh.

I'll have to look into bokeh support for polar projections, if so could you point me to the examples?

Pandas_Conversion currently includes a bunch of material that is now in the Columnar_Data tutorial, and the rest is generating warnings. I left everything untouched after "Conversion to complex HoloViews components", because it's clearly a duplication, and on the rest I just did a light-touch editing job in case that text is still useful.

I'll have a look at what can be saved and move it into other Tutorials as it's probably not worth it keeping this Tutorial around.

@philippjfr
Copy link
Member

extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_coords = hv.Points(img.table(), extents=extents)
img + img * img_coords *
hv.Points([img.closest([(5.1,4.9)])])(style=dict(color='r')) + img.sample([(5.1,4.9)])

Yes, this seems to reveal two bugs, first I didn't account for snapping in 1D in sample, secondly this shouldn't work at all because 2D Points shouldn't be snapping even when only a 1D index is supplied. This example should be using Scatter instead.

@jbednar
Copy link
Member Author

jbednar commented Jan 21, 2016

I think that the helper function to translate markers to bokeh seems useful, so I'd keep it unless it's an egregious hack (which it doesn't sound like).

The polar histogram example plot I saw in bokeh was at http://bokeh.pydata.org/en/latest/docs/gallery/burtin.html, but looking more closely at it, I don't see any usable polar-histogram implementation there; it's just a very low-level example of drawing wedges in a circle. So if someone needs polar histograms using the bokeh backend, we'd have to submit something to bokeh to implement them properly first.

What's the long-term plan for the Elements Tutorial, wrt backends? I've already made various edits to the matplotlib version that should presumably be mirrored into the Bokeh version, plus the Bokeh version has a few changes relative to the matplotlib version (plus apparently some missing Elements). Should we set up some automatic mechanism for generating a version specific to each backend, by applying patches to one master version? Presumably the master version would be easiest to maintain if it's an actual runnable notebook, i.e. the matplotlib version, perhaps with some markdown sections for code only for other versions, which would then be enabled during patching?

@jbednar
Copy link
Member Author

jbednar commented Jan 21, 2016

BTW, is this statement from Elements really true?

Such a plot wouldn't be meaningful for Scatter, but is a valid use for Points, where the x and y locations are independent variables representing coordinates, and the "data" is conveyed by the size and color of the dots

Seems like a Scatter could easily be plotting two or three value dimensions, e.g. if someone measured the elevation and size of something distributed at regular intervals along a horizontal length, leading to a Scatter plot with height on y and size as dot size.

@jbednar
Copy link
Member Author

jbednar commented Jan 21, 2016

The functionality of the TableConversion class may be conveniently accessed using the .to property, which should have its own tutorial someday, but hopefully this will get the idea across:

Is it still true that this is undocumented?

Internally, Path Element types hold a list of Nx2 arrays, specifying the x/y-coordinates along each path.

Is that still true? If not please update. E.g. what else can they be internally?

Also, I added an example of a non-gridded VectorField, as requested by someone (can't find this issue anymore!), which of course will break the tests for Elements. Feel free to improve that example, e.g. to make it more colorful or to show lines instead of arrows.

@jbednar
Copy link
Member Author

jbednar commented Jan 26, 2016

Should index.rst be updated to say we support Python 3.5 now? If so we need a 3.5 test on Travis. Then again, it says we support Python 3.3, and there's only a 3.4 test on Travis, so I don't know how we can know if 3.3 support is true...

@philippjfr
Copy link
Member

Since we want the documentation to be updated asap, I'll be updating the reference_data now and merge this PR. Any further fixes can be made in future PRs.

@jbednar
Copy link
Member Author

jbednar commented Jan 27, 2016

Great, thanks! We need to make sure to visually check the generated output for the files whose reference data changed, to make sure we don't miss some subtle change in behavior.

@philippjfr
Copy link
Member

I've gone through all the reference_data and it all looks fine so I've updated it and will merge now. I'll reopen this PR shortly and go through your other suggestions. Buildbot will push the updated website to dev.holoviews.org in a few minutes and I'll update the main website to match.

philippjfr added a commit that referenced this pull request Jan 28, 2016
@philippjfr philippjfr merged commit ef42361 into master Jan 28, 2016
@jlstevens jlstevens deleted the doc-cleanup branch February 4, 2016 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants