Roadmap #7

mbauman · 2015-02-24T23:29:45Z

We're getting to the point where the core API is stabilizing. We can now start making things fancier and building out interfaces to base methods and other packages (like #5). Here's a current brain-dump of some of my thoughts. Additions, critiques and comments are very welcome.

Remaining core infrastructure

Vector and multi-dimensional setindex! (mirroring getindex's capabilities). (WIP: adds AxisArray setindex! methods #11)
Type-stable constructors. We should allow passing tuples of Axis types to specify the dimension names. It would be an interesting experiment to store the dimension names as Axis types instead of symbols. I'm not sure if that'd make things simpler or more convoluted. It may be a mixed bag.
Display. We should display axis names and values in a sensible way. In some regards, I think they're more important to see than a small window into the data, particularly with more than 2 dimensions. This isn't perfect, but 4b62efe is a pretty big improvement.
Online documentation and a README revamp. I think that Lexicon.jl/MkDocs is currently the easiest and best solution to make the inline documentation accessible online. Basic implementation from Add necessary Documenter.jl files #29.

Possible additions to the core infrastructure

Add a third flavor of axis trait for Dimensional axes with elements of a discrete step-like type. The key defining characteristic of this element type is that their StepRanges must enumerate all values between the endpoints, allowing us to provide sensible indexing directly with a StepRange. This also means that there's no issues along the lines of floating point instability, so we can also allow indexing directly with single-elements of this type (and don't need to force the use of Intervals). The main use-case I see here is for Date. Are there other types that satisfy these criteria? What is a sensible name for this trait?
Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup. Some experimentation is required here: I'm sure the linear search will outperform hashing for small N (particularly with symbols), but what's the cutoff? 10? 100? What about strings? Chances are that folks won't be using categorical vectors to enumerate more than 100 elements.
Allow an unsafe constructor that doesn't check axis invariants. Or maybe check upon wrapping with an Axis type (Move more logic into Axis type? #15). Ensuring that large, non-Range Dimensional axes are monotonically increasing can take a long time (we may be able to speed this up some with a special Ordering type, but it's still O(n)). Similarly, ensuring that elements in categorical vectors are unique requires hashing all elements (which could be used for the above hashmap).
Custom iterators. An eachslice iterator would be nice and useful in and of itself, and if we allow the same sort of syntax and semantics as mapslices it can serve as the building block for augmenting that Base function.
Windowed repetitions, with a more generic implementation of Signals.window. I was thinking of allowing windowing as an indexing operation (Interval types mbauman/Signals.jl#10), but constructing vectors of interval types with deferred promotion (to allow, e.g., windows specified in time about integer indices) has been very challenging.

Extensions to Base

We should specialize all Base functions that allow selecting specific dimensions, like sum, mean, maximum, mapslices, etc. We can also return AxisArrays with the properly reduced axis set, dropping a dimension and eliminating the type-unstable squeezes.
Permutations and transposes could keep track of and preserve axis information
NamedArrays.jl and TimeSeries.jl go even farther to specialize matrix arithmetic to try to preserve names through operations. I'm not convinced this is worth the effort, but it is appealing.

Extensions to other packages

It'd be nice to allow re-interpolating along any axis with Interpolations.jl. This will require some work on Interpolations.jl, too.
DSP.jl could provide filtering (High-level filtering mbauman/Signals.jl#1) with sensible units for filter design and spectrograms that return a properly annotated AxisArray with an extra frequency (or inverse-whatever) axis.

The text was updated successfully, but these errors were encountered:

tshort · 2015-02-25T01:15:24Z

Great ideas, Matt!

timholy · 2016-07-22T20:00:25Z

Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup.

Julia could definitely use a perfect-hashing "dictionary" type. If anyone tackles this, please make it a standalone package rather than burying it in some other package. There would be many users.

phaverty · 2016-10-29T22:06:57Z

Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup.
NamedArrays now uses an OrderedDict for its axis names. Profiling showed that when subsetting such a NamedArray, much of the time went into making a new OrderedDict for the new NamedArray. I have a PR (#211) over at DataStructures that speeds this up quite a bit, but this PR is on hold until I can make it backwards compatible with julia 0.4. (Any suggestions would be most welcome.)

gajomi · 2019-05-01T18:11:50Z

Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup.

+1

I'm sure the linear search will outperform hashing for small N (particularly with symbols), but what's the cutoff? 10? 100? What about strings? Chances are that folks won't be using categorical vectors to enumerate more than 100 elements.

FWIW in biology it is not uncommon to talk about categories of thousands of species, tens of thousands of gene families. In medicine O(10^5) identifiers for diagnoses, similar for patients.

nickrobinson251 · 2019-08-23T13:09:01Z

See also work on new packages inspired by AxisArrays JuliaCollections/AxisArraysFuture#1

tshort mentioned this issue Mar 5, 2015

Add a SortedVector for keyed axis indexes and hierarchical indexing #16

Merged

timholy mentioned this issue Sep 19, 2016

Support (c)transpose #42

Merged

glennmoy mentioned this issue Feb 8, 2021

Add LinearCombination transform invenia/FeatureTransforms.jl#8

Merged

nicoleepp mentioned this issue Feb 9, 2021

Make LinearCombination support specifying axis key as the dims? invenia/FeatureTransforms.jl#13

Closed

glennmoy mentioned this issue Feb 24, 2021

mapslices errors for AxisArrays #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap #7

Roadmap #7

mbauman commented Feb 24, 2015 •

edited

Loading

tshort commented Feb 25, 2015

timholy commented Jul 22, 2016

phaverty commented Oct 29, 2016

gajomi commented May 1, 2019

nickrobinson251 commented Aug 23, 2019 •

edited

Loading

Roadmap #7

Roadmap #7

Comments

mbauman commented Feb 24, 2015 • edited Loading

Remaining core infrastructure

Possible additions to the core infrastructure

Extensions to Base

Extensions to other packages

tshort commented Feb 25, 2015

timholy commented Jul 22, 2016

phaverty commented Oct 29, 2016

gajomi commented May 1, 2019

nickrobinson251 commented Aug 23, 2019 • edited Loading

mbauman commented Feb 24, 2015 •

edited

Loading

nickrobinson251 commented Aug 23, 2019 •

edited

Loading