Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Expressions] Unifying datatable, kibana_datatable, and lens_multitable #68457

Closed
wylieconlon opened this issue Jun 5, 2020 · 36 comments
Closed
Assignees
Labels
discuss Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Feature:Lens

Comments

@wylieconlon
Copy link
Contributor

Lens, Canvas, and Visualize all have different table implementations, which makes it difficult to define a standard library of table manipulation functions. For example, I would like to use the mapColumn function in Lens, but I can't due to its signature.

Comparison of the different tables:

Type Usage Special properties
datatable Exclusively used by Canvas, returned by essql Contains a type property on each column
kibana_datatable Returned by esaggs, contains more information like form Contains formatHint and indexPatternId, as well as aggConfigParams- these are all used by Lens
lens_multitable Lens wraps all kibana_datatables in this structure, which lets us combine data into one visualization It's a Record<string, KibanaDatatable> representing Layers

To highlight some of the differences that might be fixable:

  • Lens needs to support multiple data sources, but maybe the lens_multitable approach can be replaced by a different approach. Could we use Variables instead?
  • Sql returns different information than Esaggs, but I don't think it needs to be
  • Metadata returned by Esaggs is particularly important to our use in Lens

So as a starting point of discussion, I think there are two potential solutions to this:

a) Combine datatable and kibana_datatable into a single representation. Lens can use some solution to combine all the tables that doesn't involve a custom wrapper.
b) Change the signature of the table manipulation functions to accept any input and provide the same type of output

Would appreciate feedback from @ppisljar and @lukeelmers

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@ppisljar
Copy link
Member

ppisljar commented Jun 8, 2020

i agree. we should standardize the meta information added to the columns and then have all functions use just a single datatable

correct me if i am wrong, but formatHint is not needed on the kibana_datatable as its subset of aggConfigParams. All it needs to work is aggregation type and field.

also, aggConfigParams are mostly unused, we only need field and agg_type so we can create the filter later.

so currently we have this on meta on kibana_datatable:

  • type - aggregation type used
  • indexPatternId
  • aggConfigParams - a set of parameters params including:
    -- field
    -- a bunch of other things specific to each aggregation

what do you think of changing it into:

  • source: esaggs, essql, .... indicate which source function produced the data
  • type: additional information about the datasource. would contain aggregation type for esaggs, canvas could continue using what it currently does)
  • field: the source field
  • dataset: index pattern id or another dataset indicator ... or should we just call this indexPAtter

we should then also update the utilities like:

  • creating instance of field format based on this data
  • creating filters based on this data

@wylieconlon
Copy link
Contributor Author

@ppisljar If I understand your comment, you are talking about changing the shape of both datatable and kibana_datatable to provide more metadata, which we definitely need. There are two areas that I'd like you to answer specifically:

  • How do you imagine Lens will work with our need for multiple tables?
  • How will your proposed table structure work for something like getting the date interval from SQL? We need this to render charts correctly.

@wylieconlon
Copy link
Contributor Author

Maybe a more specific question for @ppisljar or @streamich: Would it be reasonable for Lens to write an expression that sets each table to a variable, something like:

kibana
| kibana_context
| var_set name="table_a" value={kibana_context | esaggs ...}
| var_set name="table_b" value={kibana_context | esaggs ...}
| lens_xy_chart layers={lens_xy_layer layerId="1" table_var="table_a"} layers={lens_xy_layer layerId="2" table_var="table_b"}

@ppisljar
Copy link
Member

yes @wylieconlon i think that could be reasonable. another option as you mentioned would be using a type that supports multiple tables.

and yes, i want to change formats of all the datatables, or more specifically:

  • change datatable to contain all the metadata we need in all solutions
  • remove kibana_datatable

so the goal of this issue should be to figure out a minimal set of required meta information we need to attach to the datatable to address the needs in canvas, lens and visualize

@flash1293
Copy link
Contributor

flash1293 commented Jun 10, 2020

I think this is a nice approach @wylieconlon . It's basically turning the expression into an imperative language. Each var_set is an instruction which are processed one by one, but no data is implicitly passed. In your example the two esaggs requests would not happen in parallel, but we can probably fix that by introducing a multi_var_set.

Another option (which I think is relatively similar but maybe makes the syntax a little cleaner) is to leverage the "context" passed along by keeping a "multitable" data structure around and simply providing an apply_multi_table function which applies any given datatable function to the tables within a multitable:

If we don't do anything to the tables, we can just pass them via expression (like it's happening right now in Lens):

kibana
| kibana_context
| multi_table name=table_a table={kibana_context | esaggs ...} name=table_b  table={kibana_context | esaggs ...}
| lens_xy_chart

The difference becomes obvious if tables are altered along the way.
A small example (fetching two tables, applying an operation to one of them, then merging them by concatenating the rows of both tables using another function):

With the variables approach it would look like this:

kibana
| kibana_context
| var_set name="table_a" value={kibana_context | esaggs ...}
| var_set name="table_b" value={kibana_context | esaggs ...}
| var_set name=table_a value={var_get name=table_a | transpose}
| var_set name=merged_table value={row_merge table={var_get name=table_a} table={var_get name=table_b}}
| lens_xy_chart layers={lens_xy_layer layerId="1" table_var="merged_table"}

With the multi table which can be passed along, the variables can be omitted:

kibana
| kibana_context
| multi_table name=table_a table={kibana_context | esaggs ...} name=table_b  table={kibana_context | esaggs ...}
| apply_multi_table name=table_a fn={transpose}
| row_merge
| lens_xy_chart

I might miss something, but IMHO a multitable data structure and an apply_multi_table function checks all the boxes:

  • Clean syntax by leveraging the pipeline aspect of expressions
  • Avoid deep nesting of expressions (those get hard to read quick, IMHO it's much easier to digest if a large "main expression" is kept with only small sub expressions along the way)
  • Leveraging existing datatable functions
  • Introducing a standard for multiple data tables instead of forcing each consumer that needs them implementing them on their own using multiple arguments (e.g. table merging functions, charts which can use multiple tables, ...)
  • Still possible to use variables to pass things around if the user wants to

@wylieconlon
Copy link
Contributor Author

wylieconlon commented Jun 16, 2020

@flash1293 I finally got a chance to read this and respond to it. While I agree about many of the criteria you've listed, I don't share the preference to keep using a multitable, because i think that there is a "cleaner" way to pass multiple tables into expression functions. I do share your interest in keeping a clean syntax using expression chaining, so here is my updated proposal:

  • Data manipulation functions currently use the passed-in context, but could instead be provided with a variable name to read and update. This lets us break out of the chain when needed. This is basically what you wrote with apply_multi_table, since that example broke out of the chain as well. For example, var_set name="table_a" value={esaggs ... } | mapColumn table="table_a"
  • Not all visualizations should support a multi-table: the only use case for this is stacking XY series. So I propose that the default way of passing tables into a renderer is unchanged (by using context) while for a renderer that takes multiple tables, it should be using a multi argument like this: var_set name="table_a" value={esaggs ...} name="table_b" value={esaggs ...} | lens_xy_chart table="table_a" table="table_b"

Putting this together into the same examples as you've given:

kibana
| kibana_context
| var_set name="table_a" value={esaggs ...} name="table_b"  value={esaggs ...}
| mapColumn table="table_a" name="columnId" expression={getCell "columnId" | replace "\W+" "_" }
| lens_xy_chart table="table_a" table="table_b" 

I am proposing that we modify all of the existing table manipulation functions to support reading and writing from variables, and to update the Lens expression functions to accept variable names instead of passing via context.

The main benefit of the approach I'm proposing here is that we don't need to write a new set of functions like multi_table and apply_multi_table.

@flash1293
Copy link
Contributor

This seems like it's trading off the overhead of maintaining a separate multi table data structure and utility functions with pulling the variable logic into all table related functions.

I don't have a strong preference in either direction - the maintenance effort seems roughly identical to me. Maybe someone from the @elastic/kibana-canvas team wants to weigh in whether extending the table manipulation functions with variable getters like this is something that fits in the greater scope.

In short: Sounds good to me as well.

@ppisljar
Copy link
Member

regarding variables: no function should be allowed to read/write variables. only expression parser can do that (special var_set and var functions). that doesn't negate anything in the above suggestion however, only syntax would change a bit, but imo in a better (more flexible) way:

| lens_xy_chart table={var table_a} table={var table_b} table={esaggs ....}

however i think we will need a multitable type. i am working on the esdsl at the moment, and esdsl can return raw documents and aggregated data. The best solution i currently see is to use multitable and then have an automatic converter from multi to single table (where we take the first one) and a special function which allows you to select specific table out of multitable.

@flash1293
Copy link
Contributor

flash1293 commented Jun 17, 2020

@ppisljar If this is just about esdsl, I don't think we need to introduce a multi table concept just because of it. I recently learned it's bad practice to fetch both documents and aggregated data in the same request anyway, so we could just avoid that case at all and always work with a single table as result of esaggs/esdsl - if a user wants to fetch both aggregations and documents, they should use esdsl twice and use separate vars for it.

@wylieconlon
Copy link
Contributor Author

The main reason I wanted to avoid a multitable is because I want to avoid an explosion of table manipulation functions: it is important to me that we keep the standard library of table manipulation functions small, without creating a separate function for datatable and multitable. Do you think that the multitable approach would be compatible with all of the existing functions @ppisljar?

@flash1293
Copy link
Contributor

@wylieconlon Why do you think there would be an explosion of functions? Using something like apply_multi_table you can leverage all data table related functions. So it would be a constant overhead, not a linear one.

@wylieconlon
Copy link
Contributor Author

@flash1293 the part I liked about @ppisljar's proposal is that we would have an automatic converter for multitables: this would hopefully make it so all existing expressions written by Canvas users will continue to work. Basically I think the function you've proposed makes it so that our users need to know more about our internals than I think they should.

Also @ppisljar you've asked for a set of requirements for the metadata: we already had a separate issue for it here #65262

@flash1293
Copy link
Contributor

@wylieconlon I like the auto-converting approach and we should go forward with it but we still would need something like apply_multi_table for cases where the user wants to work with multiple tables specifically. IMHO it's not an "internal" whether you are working with a single or multiple tables - users have to decide which table they want to apply a function to. In the vars proposal this fact is just encoded in a different way.

I don't have a strong opinion here because I think the various approaches are very close to each other in terms of usability. Can we commit to one?

@ppisljar
Copy link
Member

auto converting will only get you so far, i agree with wylie that multitable can be a problem

for example (lets assume for a moment esdsl returns multitable):

esdsl ... | any table processing | xy_chart

any table processing will degrade that multitable into single table so xy chart no longer gets all the tables.

what about going in this direction:
lens_xy_chart table={esaggs ...} table={esaggs ...} table={esaggs ....}

so no multitable but lense accepting tables as arguments (instead or next to input)

actually, how exactly do you use multitable in lens today ? something like
join_tables table={esaggs} table={esaggs} | xy_chart ?

also do you need multitable for your other chart types like pie ?

@wylieconlon is getting aggs and docs together so bad that we should not support it for any case ? and what i am most worried about that not supporting it will most likely mean we still query for it but just ignore the results, unless additional steps are taken to make sure that's not the case.

@flash1293
Copy link
Contributor

what about going in this direction:
lens_xy_chart table={esaggs ...} table={esaggs ...} table={esaggs ....}

so no multitable but lense accepting tables as arguments (instead or next to input)

This sounds like Wylies approach to me and I'm fine with it. Restructuring lens shouldn't be too complicated.

actually, how exactly do you use multitable in lens today ? something like
join_tables table={esaggs} table={esaggs} | xy_chart ?

Yes, exactly.

also do you need multitable for your other chart types like pie ?

Not right now, only xy chart is supporting multiple layers.

is getting aggs and docs together so bad that we should not support it for any case ?

@ppisljar I assume this question is directed at me because I brought up that point. It's not a big deal to get aggs and docs together, but as it's not best practice anyway I don't see a good reason we have to support it if it is causing an overhead elsewhere (like the introduction of a whole separate data structure). If a user absolutely wants to do this, they can do two separate esdsl calls and store them in variables.

and what i am most worried about that not supporting it will most likely mean we still query for it but just ignore the results, unless additional steps are taken to make sure that's not the case.

In my POC the esdsl function actually looked like this: esdsl aggs="{...}" - so you could only specify aggs and size would be set to 0 for the request. It should be easy to extend that with arguments for the other top level props of a request and just throw in case a user tries to do both at once: esdsl query="{}" size=10 ... - if the user is defining aggs but not size, size is set to 0, if they are not defining aggs, size defaults to 10 or something like that.

@ppisljar
Copy link
Member

i prefer to leave this really a esdsl function allowing you to provide full dsl. For maximum flexibility. I think introducing a new datatype is not an issue. and if lens is not using multitable and just esdsl is it won't use multitable either, but rather do something like:

  • esdsl returns raw_response
  • raw_response can automatically convert to datatable (and kibana_table). it will take the first table (docs) unless docs.length = 0 it will take aggs
  • we provide a parse_raw_response function which accepts 'raw_response' and additional info about which table (docs or aggs) you want to extract and possibly other stuff like index pattern info etc.

@flash1293
Copy link
Contributor

@ppisljar +1 for solving that as part of the esdsl function and not influencing how other functions work.

@timroes timroes mentioned this issue Jun 25, 2020
7 tasks
@ppisljar ppisljar added the Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) label Jul 7, 2020
@lukeelmers
Copy link
Member

lukeelmers commented Jul 16, 2020

Have spent some time talking this over with @ppisljar. We have a couple updates:

First, there's a PR open for the esdsl function in case you're interested in following along.

Second, we have a proposed datatable structure that we think might solve everyone's needs... but please leave your thoughts and help us poke holes in it 🙂

This was our strategy with dealing with metadata:

  1. There is table-level and column-level meta
  2. meta keys are predictable and always present
  3. The only place that contains arbitrary function-specific meta is the column's meta.params, and everything that lives here must be serializable.
    • Currently we have examples of how these params would look for esaggs but need input from y'all on what the needed params would be for essql and esdsl, if any

More details in the comments below...

interface Datatable {
  meta: DatatableMeta;
  columns: DatatableColumn[];
  rows: DatatableRow[];
}

interface DatatableMeta {
  // Name of expression function which retrieved data
  type: string;
  // Kibana index pattern ID, ES index pattern, or table name
  source: string;
}

interface DatatableRow {
  // `key` is a `DatatableColumn.name`
  [key: string]: unknown;
}

interface DatatableColumn {
  name: string; // this is basically the ID
  meta: DatatableColumnMeta;
}

interface DatatableColumnMeta {
  // the agg type or boolean, string, etc
  type: string;
  // name of the field from the data that this column represents
  field: string;
  // arbitrary params which are function-specific but must be serializable
  params: SerializableState | {};
}

// Helper for getting a field format
type getFieldFormatForDatatableColumn = (datatable: Datatable, column: string | number) => FieldFormat;

Examples

esaggs
const esaggsTable: Datatable = {
  meta: {
    type: 'esaggs',
    source: '90943e30-9a47-11e8-b64d-95841ca0b247',
  },
  columns: [
    {
      name: 'response.keyword: Descending',
      meta: {
        type: 'count',
        field: 'response.keyword',
        // for esaggs, params are basically the serialized AggConfig
        params: {
          size: 5,
          order: 'desc',
          orderBy: '1',
          otherBucket: false,
          missingBucket: false
        },
      }
    },
    {
      name: 'Average bytes',
      meta: {
        type: 'avg',
        field: 'bytes',
        params: {},
      }
    },
  ],
  rows: [{ 'response.keyword: Descending': '200', 'Average bytes': 5926.317024128686 }],
};
essql
const essql: Datatable = {
  meta: {
    type: 'essql',
    source: 'kibana_sample_data_logs',
  },
  columns: [
    {
      name: 'response.keyword: Descending',
      meta: {
        type: 'string',
        field: 'response.keyword',
        params: {},
      }
    },
    {
      name: 'Average bytes',
      meta: {
        type: 'number',
        field: 'bytes',
        params: {},
      }
    },
  ],
  rows: [{ 'response.keyword: Descending': '200', 'Average bytes': 5926.317024128686 }],
};
esdsl
const esdsl: Datatable = {
  meta: {
    type: 'esdsl',
    source: 'kibana_sample_data_logs',
  },
  columns: [
    {
      name: 'response.keyword: Descending',
      meta: {
        type: 'string',
        field: 'response.keyword',
        params: {},
      }
    },
    {
      name: 'Average bytes',
      meta: {
        type: 'number',
        field: 'bytes',
        params: {},
      }
    },
  ],
  rows: [{ 'response.keyword: Descending': '200', 'Average bytes': 5926.317024128686 }],
};

One notable change to point out is that we didn't put an id in the columns/rows, instead using the column names. We didn't see a technical reason why this would be a problem, but as the names are meant to be more human-readable we can preserve the existing tabify ID style if desired col-1-1 col-0-2.

Another idea here (which would presumably be a much larger change) would be to use nested arrays as our rows, instead of the arrays of objects with IDs we currently have:

interface Datatable {
   meta: DatatableMeta;
   columns: DatatableColumn[];
-  rows: DatatableRow[];
+  rows: unknown[][];
}

@flash1293
Copy link
Contributor

flash1293 commented Jul 17, 2020

Thanks for the great summary, @lukeelmers ! I got one question, one nit and one concern.

The question: the column type prop is the agg type or boolean, string, etc - can we formalize this a little better? If the meaning of the type depends on the function used to create the data table, it gets much harder to use - e.g. is date the same thing for all data sources or does it mean something else when the data table comes from esssql? My proposal is to make type a finite list like 'string' | 'boolean' | 'number' | 'object' | 'unknown', keeping function specific stuff like agg types to the params.

The nit: About the field meta info - I don't think we can populate this in a lot of cases. In your sql and dsl examples each column has field info, but when doing calculations and more complex queries in general, it's often not possible or meaningful to map a column to a field. IMHO this should simply be an optional property, forcing all consumers to handle the case explicitly.

The concern: For the Lens integration we currently rely on predictable column naming (col-1-1) to map the columns correctly to the chart dimensions. If this is a dynamic string like @timestamp per 3 hours I'm not sure how we can do this in a clean manner - using something like column.name.startsWith('@timestamp') seems very wrong.

Using just the predictable ids like col-1-1 also seems insufficient because the human readable labels are carrying dynamic information (e.g. the chosen auto interval) we rely on as well.

Seems like Visualize has the same problem here, how are you planning to map these?

My suggestion would be making the name a predictable technical thing with a "well known" meta property label:

interface DatatableColumn {
  name: string; // ID like col-1-1
  meta: DatatableColumnMeta;
}

interface DatatableColumnMeta {
  // ...
  // human readable label if the datasource can  provide it
  label?:  string;
}

@lukeelmers
Copy link
Member

Thanks @flash1293 -- these are all great points.

The column type prop is the agg type or boolean, string, etc - can we formalize this a little better?
My proposal is to make type a finite list like 'string' | 'boolean' | 'number' | 'object' | 'unknown'

That's a fair question & I agree the meaning is a bit loosely defined. I think we were starting from thinking about how esaggs worked, and then extending that line of thinking to esdsl and essql which is how we ended up in this place.

I think your proposal of making this a finite list makes sense. We could make this work with esaggs by making these values match the values of KBN_FIELD_TYPES -- what are your thoughts on that? This would include all of the items you listed above, as well as date, IP, geo shape, etc:

enum KBN_FIELD_TYPES {
  _SOURCE = '_source',
  ATTACHMENT = 'attachment',
  BOOLEAN = 'boolean',
  DATE = 'date',
  GEO_POINT = 'geo_point',
  GEO_SHAPE = 'geo_shape',
  IP = 'ip',
  MURMUR3 = 'murmur3',
  NUMBER = 'number',
  STRING = 'string',
  UNKNOWN = 'unknown',
  CONFLICT = 'conflict',
  OBJECT = 'object',
  NESTED = 'nested',
  HISTOGRAM = 'histogram',
}

About the field meta info - I don't think we can populate this in a lot of cases.

That's a good point; I hadn't considered how field may not always be applicable. The original intent was to have these meta keys be as predictable as possible, i.e. they are all required, but perhaps making them optional is more sensible in this case.

For the Lens integration we currently rely on predictable column naming (col-1-1) to map the columns correctly to the chart dimensions.

Thanks for pointing this out; we had assumed that nobody was relying on predictable naming like this, which is why we excluded the old way of doing column IDs to simplify. However we can definitely add the id back in as there's a use for it.

Revised interfaces, lmk what you think:

interface Datatable {
  meta: DatatableMeta;
  columns: DatatableColumn[];
  rows: DatatableRow[];
}

interface DatatableMeta {
  type: string;  // expression function which retrieved data
  source: string; // index pattern ID, ES index pattern, or table name
}

interface DatatableRow {
  [key: string]: unknown; // `key` is a `DatatableColumn['id']`
}

interface DatatableColumn {
  id: string; // predictable ID like `col-1-1`
  name: string; // human-readable name
  meta: DatatableColumnMeta;
}

interface DatatableColumnMeta {
  type: 'unknown' | 'string' | 'object' | 'date' | 'ip' | etc; // union of KBN_FIELD_TYPES values
  field?: string;  // if column data represents a field, the name goes here
  params: SerializableState | {}; // arbitrary params which are function-specific
}

@flash1293
Copy link
Contributor

Hi @lukeelmers ,

the revised interfaces look great to me, thanks. I don't know how difficult it is for canvas to provide a label along with an id, so it would also be fine with making name optional if it's too hard to populate for some data sources. On the other side a data source can always just duplicate the id in the name property, so no strong opinion here.

@ppisljar
Copy link
Member

in the worst case we can always go with name=id

@flash1293
Copy link
Contributor

flash1293 commented Jul 28, 2020

@ppisljar Currently in Lens we rely on both name and id (human readable name and predictable col-... id scheme), so we can't make these the same thing without changes in that area (which I'm happy to do if we find a good alternative)

Edit: I think I misunderstood what your comment is about, if you mean to set name to the same value as id if there is no separate human readable label, then please disregard.

ppisljar added a commit to ppisljar/kibana that referenced this issue Jul 28, 2020
@ppisljar
Copy link
Member

started looking into essql and it seems its not possible to provide index nor field names ....
to do that we would need to parse the query and extract that information out. Are there by any chance any utilities existing that could do that ?

@flash1293
Copy link
Contributor

flash1293 commented Jul 28, 2020

I think it's very hard to do that reliably considering how much you can do with a grammar like SQL (even within the tight boundaries of ESSQL). When looking into this in my space time I imagined to have a separate argument to the function (or maybe a separate expression function) to manually specify the fields to make filters work.

This means:

  • By default, no meta information at all
  • Allow the user to manually add it

We could put a step in between those two, but I'm not sure how valuable it is:

  • Parse the query and auto-fill field names on best effort basis

@flash1293
Copy link
Contributor

I think we have a good idea how to unify datatable and kibana_datatable, but I'm not sure whether we reached this conclusion yet for lens_multitable.

The latest state of the discussion is @wylieconlon s proposal:

I am proposing that we modify all of the existing table manipulation functions to support reading and writing from variables, and to update the Lens expression functions to accept variable names instead of passing via context.

kibana
| kibana_context
| var_set name="table_a" value={esaggs ...} name="table_b" value={esaggs ...}
| mapColumn table="table_a" name="columnId" expression={getCell "columnId" | replace "\W+" "_" }
| lens_xy_chart table="table_a" table="table_b"

As this has an impact on the "standard library" of expression functions, I would like to bring it up again. In short, besides getting passed in the table to work with via context, all table manipulation functions would get a separate table argument which, when set, is used as data source and data target instead. @ppisljar @clintandrewhall are you fine with this change? If this is a reasonable approach from your point of view, the Lens team can start refactoring expression usage towards this goal.

@ppisljar
Copy link
Member

ppisljar commented Aug 6, 2020

i think we should rather go with something like this:

kibana
| kibana_context
| var_set name="table_a" value={esaggs ...} name="table_b" value={esaggs ...}
| var "table_a" | mapColumn name="columnId" expression={getCell "columnId" | replace "\W+" "_" } | var_set name="table_a"
| lens_xy_chart table="table_a" table="table_b"

which doesn't allow changing all datatable manipulation functions. I am worried about that due to

  1. updating all functions .... what about 3rd party/solutions ... any plugin ?
  2. how do we enforce it ? and if we can't how good is it ?
  3. seems to bring a new concept into expressions, which if possible we should avoid to not make it more complicated as it is.

@ppisljar
Copy link
Member

ppisljar commented Aug 6, 2020

actually, what you should do in your above example is probably:

kibana
| kibana_context
| var_set name="table_a" value={esaggs ... | mapColumn name="columnId" expression={getCell "columnId" | replace "\W+" "_" }} name="table_b" value={esaggs ...}
| lens_xy_chart table="table_a" table="table_b"

(simpler syntax, but achieves the same as my first suggestion). so at the moment i don't see a need for introducing this table argument on every table manipulation function,.

@flash1293
Copy link
Contributor

@wylieconlon Do you see any problems with @ppisljar s suggestion? To me it looks like this is a nice middle ground which still gives us everything we need without doing "weird" changes to the base functions

@clintandrewhall
Copy link
Contributor

clintandrewhall commented Aug 6, 2020

@wylieconlon @ppisljar @flash1293 Canvas exports workpads in JSON format. I don't see how we can change from context to var without breaking older workpads.

EDIT: Chatted with @flash1293 -- as long as these changes don't break old workpads and allow Canvas users to use tables as they have throughout 7.x, I don't see an issue. I'll do a sanity-check with the rest of @elastic/kibana-canvas to be certain.

@wylieconlon
Copy link
Contributor Author

Yes, this looks like it will give us everything we need. It also extends to future uses such as joins, which I would imagine looking like this:

kibana
| kibana_context
| var_set name="table_a" value={esaggs ...} name="table_b" value={esaggs ...}
| var_set name="joined" value={left_join left_key="user" right_key="name" table={var "table_a"} table={var "table_b"}}
| lens_xy_chart table="joined"

The main change here is that we are expanding the use of variables, and having the Lens render function ignore the context and instead require variable names. This keeps the scope of the change small for other apps like Canvas.

@flash1293
Copy link
Contributor

I branched out the removal of Lens multi table into #74643 @lukeelmers can we close this dicussion issue or will you use it to track changes in the table format?

@ppisljar
Copy link
Member

ppisljar commented Aug 10, 2020

agreed on this interface:

interface Datatable {
  meta?: DatatableMeta;
  columns: DatatableColumn[];
  rows: DatatableRow[];
}

interface DatatableMeta {
  type: string;  // expression function which retrieved data
  source?: string; // index pattern ID, ES index pattern, or table name
}

interface DatatableRow {
  [key: string]: unknown; // `key` is a `DatatableColumn['id']`
}

interface DatatableColumn {
  id: string; // predictable ID like `col-1-1`
  name: string; // human-readable name
  meta: DatatableColumnMeta;
}

interface DatatableColumnMeta {
  type: 'unknown' | 'string' | 'object' | 'date' | 'ip' | etc; // union of KBN_FIELD_TYPES values
  field?: string;  // if column data represents a field, the name goes here
  params?: SerializableState; // arbitrary params which are function-specific
}

@ppisljar
Copy link
Member

as table can be result of multiple functions (when merging tables etc) we shouldn't have any metadata on the table itself but rather all the relevant info should be kept on the column. the new proposal:

interface Datatable {
  columns: DatatableColumn[];
  rows: DatatableRow[];
}

interface DatatableRow {
  [key: string]: unknown; // `key` is a `DatatableColumn['id']`
}

interface DatatableColumn {
  id: string; // predictable ID like `col-1-1`
  name: string; // human-readable name
  meta: DatatableColumnMeta;
}

interface DatatableColumnMeta {
  type: 'unknown' | 'string' | 'object' | 'date' | 'ip' | etc; // union of KBN_FIELD_TYPES values
  field?: string;  // if column data represents a field, the name goes here
  index?: string; // index (if field is set)
  params?: SerializableState; // arbitrary params for field formatter
  source?: string; // name of the function that generated this data (esaggs, esdocs, esdsl, essql)
  sourceParams?: SerializableState; // additional parameters specific to each source function
}

@flash1293
Copy link
Contributor

For Lens this would work as well. There might be some strange edge cases, but it covers a lot of legit cases we would have run into anyway at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Feature:Lens
Projects
None yet
Development

No branches or pull requests

6 participants