Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom fields configured using SQL queries #16

Open
simonw opened this issue Aug 3, 2020 · 19 comments
Open

Support for custom fields configured using SQL queries #16

simonw opened this issue Aug 3, 2020 · 19 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Aug 3, 2020

Datasette canned queries could work really well with this plugin, especially in conjunction with the ability to restrict direct access to tables in #4.

Andy Ingram on Twitter says: https://twitter.com/andrewingram/status/1289624490037530625

The GraphQL creators are (I think) unanimous in their skepticism of tools that bring GraphQL directly to your database or ORM, because they just provide carte blanche access to your entire data model, without actually giving API design proper consideration.

By turning off direct access to tables and instead using canned queries, developers can use this plugin to design the exact GraphQL mechanism that they want. Canned query arguments will turn into GraphQL arguments.

Writable canned queries could even be used to define GraphQL mutations.

@simonw simonw added the enhancement New feature or request label Aug 3, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

Need to think about how magic parameters will work: https://datasette.readthedocs.io/en/stable/sql_queries.html#magic-parameters

@simonw simonw mentioned this issue Aug 6, 2020
@simonw simonw added this to the 1.0 milestone Aug 6, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 6, 2020

Initial implementation just needs to support read-only canned queries with their parameters turned into GraphQL arguments.

@simonw
Copy link
Owner Author

simonw commented Aug 6, 2020

I'll implement first: and after: pagination too. I won't bother tackling the case where the canned query has parameters called first: or after: itself - those canned queries can be rewritten to use different parameter names.

@simonw
Copy link
Owner Author

simonw commented Aug 6, 2020

Here's a tricky aspect of canned queries: how do we know what columns will be returned so we can define them in the schema? And how do we know the types of those columns? We may have to return them all as strings.

Running the query with "limit 0" is enough to get back the column names:

https://latest.datasette.io/fixtures.json?sql=select+neighborhood%2C+facet_cities.name%2C+state%0D%0Afrom+facetable%0D%0A++++join+facet_cities%0D%0A++++++++on+facetable.city_id+%3D+facet_cities.id%0D%0Awhere+neighborhood+like+%27%25%27+||+%3Atext+||+%27%25%27%0D%0Aorder+by+neighborhood+limit+0

Column types may not be possible to obtain in advance. Might have to default to string and allow people to override those in plugin configuration somehow.

simonw added a commit that referenced this issue Aug 6, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 7, 2020

It would be neat if you could optionally specify that certain columns returned by a canned query are actually IDs that correspond to primary keys in other tables, to enable foreign key nested expansions.

@simonw
Copy link
Owner Author

simonw commented Aug 7, 2020

Maybe this isn't about existing canned queries at all: maybe it's about being able to define custom names fields in terms of SQL queries, with type information and explicit arguments and nested SQL definitions for nested fields.

All this could go in the plugin configuration.

@simonw
Copy link
Owner Author

simonw commented Aug 7, 2020

It would also be good if these could support cursor adds pagination. This is a feature missing from Datasette canned queries at the moment. It requires a defined sort order.

@simonw simonw changed the title Support for canned queries Support for configured (mayb canned?) queries Aug 8, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 8, 2020

For the problem of detecting what types a SQL query will return, I may have a partial solution:

sqlite> create table tmp as select * from foo;
sqlite> .schema
CREATE TABLE foo (id int, name text, score float);
CREATE TABLE tmp(id INT,name TEXT,score REAL);
sqlite> create table tmp2 as select 1, 'dog', true, 34.5, * from foo;
sqlite> .schema
CREATE TABLE foo (id int, name text, score float);
CREATE TABLE tmp(id INT,name TEXT,score REAL);
CREATE TABLE tmp2(
  "1",
  "'dog'",
  true,
  "34.5",
  id INT,
  name TEXT,
  score REAL
);
sqlite> create table tmp3 as select 1 as c1, 'dog' as c2, true as c3, 34.5 as c4, * from foo;
sqlite> .schema
CREATE TABLE foo (id int, name text, score float);
CREATE TABLE tmp(id INT,name TEXT,score REAL);
CREATE TABLE tmp2(
  "1",
  "'dog'",
  true,
  "34.5",
  id INT,
  name TEXT,
  score REAL
);
CREATE TABLE tmp3(
  c1,
  c2,
  c3,
  c4,
  id INT,
  name TEXT,
  score REAL
);
sqlite> pragma table_info(tmp4);
sqlite> pragma table_info(tmp3);
cid|name|type|notnull|dflt_value|pk
0|c1||0||0
1|c2||0||0
2|c3||0||0
3|c4||0||0
4|id|INT|0||0
5|name|TEXT|0||0
6|score|REAL|0||0
sqlite> 

So basically the trick is to run create table tmp as select ... limit 0 (within a transaction) - then inspect that new tmp table to see what types its columns have before rolling back the transaction to delete the temporary table.

This appears to work for queries that select from a real table, but weirdly it can't identify the type of literal values that were included in the select.

@simonw
Copy link
Owner Author

simonw commented Aug 8, 2020

Using limit 1 may well fix this, but only if the query can return at least one row.

@simonw
Copy link
Owner Author

simonw commented Aug 8, 2020

No, even the limit 1 trick didn't correctly detect the types of those literal values:

sqlite> create table foo5 as select 1 as c1, 'dog' as c2, true as c3, 34.5 as c4, * from foo limit 1;
sqlite> pragma table_info(foo5);
cid|name|type|notnull|dflt_value|pk
0|c1||0||0
1|c2||0||0
2|c3||0||0
3|c4||0||0
4|id|INT|0||0
5|name|TEXT|0||0
6|score|REAL|0||0
sqlite> select * from foo5;
c1|c2|c3|c4|id|name|score
1|dog|1|34.5|1|simon|3.5

@simonw
Copy link
Owner Author

simonw commented Aug 8, 2020

I can assume type of TEXT for columns that were not correctly detected.

@simonw
Copy link
Owner Author

simonw commented Aug 8, 2020

An operation that concatenated text onto a value from a real column also resulted in a type that was not detected correctly:

sqlite> create table foo6 as select 1 as c1, name || 'dog' as c2, true as c3, 34.5 as c4, * from foo limit 0;
sqlite> pragma table_info(foo6);
cid|name|type|notnull|dflt_value|pk
0|c1||0||0
1|c2||0||0
2|c3||0||0
3|c4||0||0
4|id|INT|0||0
5|name|TEXT|0||0
6|score|REAL|0||0

@simonw simonw changed the title Support for configured (mayb canned?) queries Support for configured (maybe canned?) queries Aug 8, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 9, 2020

Columns without types can be tricky: the won't benefit from type coercion so queries like where colname = '23' will fail if the column contains an integer - see https://www.sqlite.org/forum/forumpost/e3ed929e7c

@simonw
Copy link
Owner Author

simonw commented Aug 9, 2020

Configuration concept:

plugins:
  datasette-graphql:
    databases:
      github:
        fields:
          recent_releases:
            sql: |-
              select * from releases order by created_at desc
            fields:
              repo:
                pk_for_table: repos
              contributors:
                row_type: users
                sql: |-
                  select * from users
                  where id in (
                    select user_id from commits
                    where release_id=:id limit 5

This would setup a recent_releases root query which executes that SQL - and returns an object with a custom nested field called repo that expands to a repo (based on the repo column being a primary key for the repos table).

That nested recent_releases item also has a contributors field which executes a custom query and returns rows which are then presented as if they had come from the users automatically created type.

@simonw
Copy link
Owner Author

simonw commented Aug 9, 2020

This is pretty complex stuff! I'm not convinced by the names of things (especially row_type and pk_for_table). I also need to figure out how nodes/edges/pageInfo pagination would work.

Maybe there's a paginated: true option which turns on nodes/pageInfo for that field.

@simonw
Copy link
Owner Author

simonw commented Aug 9, 2020

I'm not going to try and get existing canned queries working with this, because of the need for custom syntax to define field types and nested fields.

@simonw simonw changed the title Support for configured (maybe canned?) queries Support for custom fields configured using SQL queries Aug 9, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 22, 2020

The easiest way to implement this would be to use Generic JSON, as seen in #53. I could get this feature running really quickly with that.

The downside is that it wouldn't have the ability for the GraphQL query author to select exactly which columns came back. That seems like a shame - but the amount of work required to introspect the SQL query and figure out the columns isn't trivial.

Alternatively... let the configuration author chose. By default use Generic JSON, but if they take the time to define the columns that are returned by the query they get a real GraphQL schema out of it. Could look something like this:

plugins:
  datasette-graphql:
    databases:
      github:
        queries:
          releases_since:
            sql: |-
              select id, title, created_at from releases where created_at > :created_since
            fields:
              id: integer
              title: text
              created_at: text

@simonw
Copy link
Owner Author

simonw commented Aug 22, 2020

Given this configuration format it would still be nice to be able to define nested fields, potentially by optionally referencing existing tables.

@simonw simonw removed this from the 1.0 milestone Aug 23, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 23, 2020

This should be designed at the same time as mutations support in #47 since they will likely work in a very similar way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant