MVP: Remotely Callable Tasks #1274

cmcarthur · 2019-02-04T21:21:32Z

Background: #1141

In the Alternative Entrypoints issue, we discussed two fundamental changes to how users are able to interact with dbt:

Give dbt a way to load a manifest file from disk and deserialize it

Give dbt a way to use that manifest to take small, specific actions, e.g.

compile a SQL string

run a SQL query against the warehouse

This issue expands on the second of the two, giving dbt a way to use a manifest to take small, specific actions, e.g. compile and run arbitrary SQL queries.

We plan to accomplish the first pass at this by implementing new tasks and new task types that allow dbt to operate as a JSON-RPC server. This is very much intended to be an MVP, and we'll plan to expand the breadth and depth of interactivity rapidly to enable more use cases.

There are four tasks / task types to be implemented:

ServerTask: On startup, performs a full dbt compile, then operates as a JSON-RPC server for handling interactive requests.
RemoteCallableTask (abstract): Designates a dbt Task as being remotely callable. Properties to be implemented by subclasses include: method name, whether it is callable synchronously and/or asynchronously. (We can punt on async for the immediate future since this issue only requires sync calls.) Properties to be implemented by this class include: standard exception handler, logging of request/response cycle. It should also register each task as a unique method in the JSON-RPC server.
CompileSQLTask: see below
RunSQLTask: see below

CompileSQLTask

The CompileSQLTask takes a base64-encoded Jinja SQL string as an argument, and spits out a compiled version of that SQL string. Extends RemoteCallableTask. Synchronous only.

kwargs:

base64-encoded jinja sql
timeout_seconds: A limit in seconds to put on the compilation. None means no timeout, and I imagine that is a reasonable place to start.

returns:

{
  "id": "<uuid>",
  "result": "Success.",
  "data": {
    "raw_sql": "...",
    "compiled_sql": "...",
    "timing": [
      {
        "type": "compilation",
        "started_at": "...",
        "finished_at": "..."
      }
    ]
  }
}

RunSQLTask

The RunSQLTask does the same stuff as CompileSQLTask, but in addition it actually runs the compiled SQL and returns the query results in tabular format. Extends CompileSQLTask. Sync only.

kwargs:

base64-encoded jinja sql
timeout???

returns:

{
  "id": "<uuid>",
  "result": "Success.",
  "data": {
    "raw_sql": "...",
    "compiled_sql": "...",
    "timing": [
      {
        "type": "compilation",
        "started_at": "...",
        "finished_at": "..."
      },
      {
        "type": "execute",
        "started_at": "...",
        "finished_at": "..."
      }
    ],
    "table": ... tabular format ...
  }
}

Implementation Notes

Manifests

We'll need the ability to take a Real Manifest and a manifest partial representing a single fake "node" and compile only the single fake "node". We should not have to implement any fancy methods of combining multiple manifests into one since the fake "node" should never overlap with a real node in the real manifest. THIS MEANS THAT COMPILING CUSTOM MACROS WILL NOT BE SUPPORTED BY THIS VERSION. But, that's ok for right now. We can solve the technical challenges involved with incorporating these partial manifests later on.

dbt's JSON-RPC spec

To start, dbt should use the minimal JSON-RPC spec, and lean on its JSON schemas to provide contracts for its responses. But, whenever possible, we should use the data field in the response to provide meaningful data in the response body, so that we have room to expand the set of required fields later on.

Tasks

Tasks currently take all of their inputs via configs. This is OK, but for this functionality to be maximally useful it would be better if they accepted a structured set of kwargs either at runtime or instantiation time. e.g. you could create a RunTask with a dynamic selection syntax.

The text was updated successfully, but these errors were encountered:

RPC server (#1274)

beckjake · 2019-03-05T14:07:46Z

Resolved in #1301

cmcarthur mentioned this issue Feb 5, 2019

Make it possible to run multiple Tasks in a single dbt process #1276

Closed

drewbanin added this to the Wilt Chamberlain milestone Feb 13, 2019

beckjake mentioned this issue Feb 19, 2019

RPC server (#1274) #1301

Merged

beckjake added a commit that referenced this issue Mar 5, 2019

Merge pull request #1301 from fishtown-analytics/feature/rpc-tasks

0a4eea4

RPC server (#1274)

beckjake closed this as completed Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP: Remotely Callable Tasks #1274

MVP: Remotely Callable Tasks #1274

cmcarthur commented Feb 4, 2019 •

edited

Loading

beckjake commented Mar 5, 2019

MVP: Remotely Callable Tasks #1274

MVP: Remotely Callable Tasks #1274

Comments

cmcarthur commented Feb 4, 2019 • edited Loading

CompileSQLTask

RunSQLTask

Implementation Notes

Manifests

dbt's JSON-RPC spec

Tasks

beckjake commented Mar 5, 2019

cmcarthur commented Feb 4, 2019 •

edited

Loading