Skip to content

Commit

Permalink
Docs: Improve Conduit documentation. (#1371)
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric-Warehime authored Dec 15, 2022
1 parent 727c4a8 commit 79f1b0e
Show file tree
Hide file tree
Showing 8 changed files with 546 additions and 26 deletions.
31 changes: 16 additions & 15 deletions docs/Conduit.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,18 @@

# Algorand Conduit

Conduit is a framework which provides reusable components necessary for ingesting blocks from the Algorand blockchain into external applications. It is designed around a modular plugin system that allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.
Conduit is a framework for ingesting blocks from the Algorand blockchain into external applications. It is designed as modular plugin system that allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.

# Getting Started

See the [Getting Started](conduit/GettingStarted.md) page.

## Building from source

Development is done using the [Go Programming Language](https://golang.org/), the version is specified in the project's [go.mod](go.mod) file.

Run `make` to build Conduit, the binary is located at `cmd/algorand-indexer/conduit`.

# Quickstart

See the [Getting Started](conduit/GettingStarted.md) page.

# Configuration

See the [Configuration](conduit/Configuration.md) page.
Expand All @@ -33,31 +33,30 @@ See the [Configuration](conduit/Configuration.md) page.

See the [Development](conduit/Development.md) page for building a plugin.

# Features

# Plugin System
A Conduit pipeline is composed of 3 components, [Importers](../conduit/plugins/importers/README.md), [Processors](../conduit/plugins/processors/README.md), and [Exporters](../conduit/plugins/exporters/README.md).
Every pipeline must define exactly 1 Importer, 1 Exporter, and can optionally define a series of 0 or more Processors.
A Conduit pipeline is composed of 3 components, [Importers](../conduit/plugins/importers/), [Processors](../conduit/plugins/processors/), and [Exporters](../conduit/plugins/exporters/).
Every pipeline must define exactly 1 Importer, exactly 1 Exporter, and can optionally define a series of 0 or more Processors.

The original Algorand Indexer has been defined as a Conduit pipeline via the [algorand-indexer](../cmd/algorand-indexer/daemon.go) executable, see [Migrating from Indexer](#migrating-from-indexer)
The original Algorand Indexer has been defined as a Conduit pipeline via the [algorand-indexer](../cmd/algorand-indexer/daemon.go) executable, see [Migrating from Indexer](#migrating-from-indexer).

# Contributing

Contributions welcome! Please refer to our [CONTRIBUTING](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) document for general contribution guidelines, and individual plugin documentation for contributing to new and existing Conduit plugins.
Contributions are welcome! Please refer to our [CONTRIBUTING](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) document for general contribution guidelines, and individual plugin documentation for contributing to new and existing Conduit plugins.

## RFCs
## RFCs (Requests For Comment)
If you have an idea for how to improve Conduit that would require significant changes, open a [Feature Request Issue](https://github.com/algorand/indexer/issues/new/choose) to begin discussion. If the proposal is accepted, the next step is to define the technical direction and answer implementation questions via a PR containing an [RFC](./rfc/template.md).

You do _not_ need to open an RFC for adding a new plugin--follow the development guide for the corresponding plugin type.
You do _not_ need to open an RFC for adding a new plugin--you can open an initial PR requesting feedback on your plugin idea to discuss before implementation if you want.

<!-- USAGE_START_MARKER -->

# Common Setups

The most common usage of Conduit is expecting to be getting validated blocks from a local `algod` Algorand node, adding them to a database (such as [PostgreSQL](https://www.postgresql.org/)), and serving an API to make available a variety of prepared queries. Some users may wish to directly write SQL queries of the database.
The most common usage of Conduit is to get validated blocks from a local `algod` Algorand node, and adding them to a database (such as [PostgreSQL](https://www.postgresql.org/)).
Users can separately (outside of Conduit) serve that data via an API to make available a variety of prepared queries--this is what the Algorand Indexer does.

Conduit works by fetching blocks one at a time via the configured Importer, sending the block data through the configured Processors, and terminating block handling via an Exporter (traditionally a database).

For a step-by-step walkthrough of a basic Conduit setup, see [Writing Blocks To Files](./conduit/tutorials/WritingBlocksToFile.md).

<!-- USAGE_END_MARKER_LINE -->

Expand All @@ -66,3 +65,5 @@ Conduit works by fetching blocks one at a time via the configured Importer, send
Indexer was built in a way that strongly coupled it to Postgresql, and the defined REST API. We've built Conduit in a way which is backwards compatible with the preexisting Indexer application. Running the `algorand-indexer` binary will use Conduit to construct a pipeline that replicates the Indexer functionality.

Going forward we will continue to maintain the Indexer application, however our main focus will be enabling and optimizing a multitude of use cases through the Conduit pipeline design rather the singular Indexer pipeline.

For a more detailed look at the differences between Conduit and Indexer, see [our migration guide](./conduit/tutorials/IndexerMigration.md).
41 changes: 33 additions & 8 deletions docs/conduit/GettingStarted.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,42 @@
# Getting Started

## Install from Source

1. Checkout the repo, or download the source.
## Installation

### Install from Source

1. Checkout the repo, or download the source, `git clone https://github.com/algorand/indexer.git && cd indexer`
2. Run `make conduit`.
3. The binary is created at `cmd/conduit/conduit`.

## Quick Start
### Go Install

Go installs of the indexer repo do not currently work because of its use of the `replace` directive to support the
go-algorand submodule.

**In Progress**
There is ongoing work to remove go-algorand entirely as a dependency of indexer/conduit. Once
that work is complete users should be able to use `go install` to install binaries for this project.

## Getting Started

Conduit requires a configuration file to set up and run a data pipeline. To generate an initial skeleton for a conduit
config file, you can run `./conduit init`. This will set up a sample data directory with a config located at
`data/conduit.yml`.

You will need to manually edit the data in the config file, filling in a valid configuration for conduit to run.
You can find a valid config file in [Configuration.md](Configuration.md) or via the `conduit init` command.

Once you have a valid config file in a directory, `config_directory`, launch conduit with `./conduit -d config_directory`.

1. Run `./conduit init` to setup a sample data directory.
2. Follow the on screen instructions to update the sample configuration in `data/conduit.yml`.
3. Run `./conduit -d data` to launch conduit with the `data` directory.

# Configuration
# Configuration and Plugins
Conduit comes with an initial set of plugins available for use in pipelines. For more information on the possible
plugins and how to include these plugins in your pipeline's configuration file see [Configuration.md](Configuration.md).

See [Configuration.md](Configuration.md) for more details.
# Tutorials
For more detailed guides, walkthroughs, and step by step writeups, take a look at some of our
[Conduit tutorials](./tutorials). Here are a few of the highlights:
* [How to write block data to the filesystem](./tutorials/WritingBlocksToFile.md)
* [A deep dive into the filter processor](./tutorials/FilterDeepDive.md)
* [The differences and migration paths between Indexer & Conduit](./tutorials/IndexerMigration.md)
93 changes: 93 additions & 0 deletions docs/conduit/tutorials/FilterDeepDive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## Filtering Transactions in Conduit

### Intro
Conduit provides individual documentation for each plugin in [docs/conduit/plugins](./plugins). However, the filter
processor in particular has a complex set of features which empower users to search for data within Transactions.
This document will go through some of those features in detail, their use cases, and show some examples.

### Logical Operators

The filter processor provides (at this time) two top level logical operators, `any` and `all`. These are used to match
"sub-expressions" specified in the filters. For any set of expressions, e1, e2, e3, ... `any(e1,e2,e3,...eN)` will return
`true` if there exists `eX` for `1 >= X <= N` where `eX` evaulates to `true`,
and `all(e1,e2,e3,...eN)` will return true if for every `X` from `1..N`, `eX` evaluates to `true`.

In simpler terms, `any` matches the transaction if at least one sub-expression matches, and `all` matches only if every
sub-expression matches.

### Sub-Expressions
So, what defines a sub-expression?

The sub-expression consists of 3 components.
#### `tag`
The tag identifies the field to attempt to match. The fields derive their tags according to the
[official reference docs](https://developer.algorand.org/docs/get-details/transactions/transactions/).
You can also attempt to match against the `ApplyData`, although this is not officially supported or documented.
Users interested in this will need to consult the official
[go-algorand](https://github.com/algorand/go-algorand/blob/master/data/transactions/transaction.go#L104) repository to match tags.


For now, we programmatically generate these fields into a map located in the
[filter package](https://github.com/algorand/indexer/blob/develop/conduit/plugins/processors/filterprocessor/fields/generated_signed_txn_map.go),
though this is not guaranteed to be the case.


Example:
```yaml
- tag: 'txn.snd' # Matches the Transaction Sender
- tag: 'txn.apar.c' # Matches the Clawback address of the asset params
- tag: 'txn.amt' # Matches the amount of a payment transaction
```
#### `expression-type`
The expression type is a selection of one of the available methods for evaluating the expression. The current list of
types is
* `exact`: exact match for string values.
* `regex`: applies regex rules to the matching.
* `less-than` applies numerical less than expression.
* `less-than-equal` applies numerical less than or equal expression.
* `greater-than` applies numerical greater than expression.
* `greater-than-equal` applies numerical greater than or equal expression.
* `equal` applies numerical equal expression.
* `not-equal` applies numerical not equal expression.

You must use the proper expression type for the field your tag identifies based on the type of data stored in that field.
For example, do not use a numerical expression type on a string field such as address.


#### `expression`
The expression is the data against which each field will be compared. This must be compatible with the data type of
the expected field. For string fields you can also use the `regex` expression type to interpret the input of the
expression as a regex.

### Examples

Find transactions w/ fee greater than 1000 microalgos
```yaml
- filters:
- any:
- tag: "txn.fee"
expression-type: "greater-than"
expression: "1000"
```

Find state proof transactions
```yaml
- filters:
- any:
- tag: "txn.type"
expression-type: "exact"
expression: "stpf"
```

Find transactions calling app, "MYAPPID"
```yaml
- filters:
- all:
- tag: "txn.type"
expression-type: "exact"
expression: "appl"
- tag: "txn.apid"
expression-type: "exact"
expression: "MYAPPID"
```
132 changes: 132 additions & 0 deletions docs/conduit/tutorials/IndexerMigration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
## Migrating from Indexer to Conduit

The [Algorand Indexer](https://github.com/algorand/indexer) provides both a block processing pipeline to ingest block
data from an Algorand node into a Postgresql database, and a rest API which serves that data.

The [Conduit](https://github.com/algorand/indexer/blob/develop/docs/Conduit.md) project provides a modular pipeline
system allowing users to construct block processing pipelines for a variety of use cases as opposed to the single,
bespoke Indexer construction.

### Migration
Talking about a migration from Indexer to Conduit is in some ways difficult because they only have partial overlap in
their applications. For example, Conduit does _not_ currently include a rest API either for checking pipeline health
or for serving data from the pipeline.

Here is the Indexer architecture diagram at a high level. The raw block data is enriched by the account data retrieved
from the local ledger, and everything is written to Postgresql which can then be queried via the API.
```mermaid
graph LR;
algod["Algod"]
index["Indexer"]
ledger["Local Ledger"]
psql["Postgresql"]
restapi["Rest API"]
algod-->index;
subgraph "Data Pipeline"
index-->ledger;
ledger-->index;
index-->psql;
end
psql-->restapi;
restapi-->psql;
```

However, Conduit was built to generalize and modularize a lot of the tasks which Indexer does when ingesting block data
into its database. For that reason you can swap out the core data pipeline in Indexer with an equivalent Conduit
pipeline--and that's just what we've done!

```mermaid
graph LR;
algod["Algod"]
be["block_evaluator Processor"]
pe["postgresql Exporter"]
algodimp["algod Importer"]
restapi["Rest API"]
algod-->algodimp
subgraph "Conduit Pipeline"
algodimp-->be;
be-->pe
end
pe-->restapi;
```

Using the most recent release of Indexer will create a Conduit pipeline config and launch the pipeline to ingest the
data used to serve the rest API. Take a look
[here](https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L359) if you're interested in
seeing the exact config used in Indexer.

### Adopting Conduit features in your Indexer pipeline

Since Indexer is now using Conduit for its data pipeline, it will benefit from the continued development of the specific
plugins being used. However, we don't plan on exposing the full set of Conduit features through Indexer. In order to
start using new features, or new plugins to customize, filter, or further enrich the block data, or even change the
type of DB used in the backed, you will need to separate Indexer's data pipeline into your own custom Conduit pipeline.

A common deployment of the Indexer might look something like this.
```mermaid
graph LR;
algod["Alogd"]
lb["Load Balancer"]
index["Indexer"]
ro1["ReadOnly Indexer"]
ro2["ReadOnly Indexer"]
ro3["ReadOnly Indexer"]
psql["Postgresql"]
algod-->index;
index-->psql;
lb---index;
lb---ro1;
ro1---psql;
lb---ro2;
ro2---psql;
lb---ro3;
ro3---psql;
```
Because the database connection can only tolerate a single writer without having race conditions and/or deadlocks,
Indexer offers a read-only mode which does not run the data pipeline and has no write access to the database. It's
common to use the read only mode to scale out the rest API--running multiple web servers behind a load balancer as is
shown in the diagram.


Separating the data pipeline from the Indexer when using this setup is simple--take Indexer's Conduit config
[shown earlier there](https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L359), write it
to a file, and launch the Conduit binary. Take a look at the [getting started guide](../GettingStarted.md) for more
information on installing and running Conduit.

We still plan on supporting the Indexer API alongside Conduit--that means that any changes made to the Postgresql plugin
will either be backwards compatible with the Indexer API, ando/or have corresponding fixes in Indexer.

Here is our architecture diagram with Conduit as our data pipeline.
```mermaid
graph LR;
algod["Alogd"]
lb["Load Balancer"]
ro1["ReadOnly Indexer"]
ro2["ReadOnly Indexer"]
ro3["ReadOnly Indexer"]
psql["Postgresql"]
be["block_evaluator Processor"]
pe["postgresql Exporter"]
algodimp["algod Importer"]
pe-->psql;
algod-->algodimp;
subgraph "Conduit Pipeline"
algodimp-->be;
be-->pe;
end
lb---ro1;
ro1---psql;
lb---ro2;
ro2---psql;
lb---ro3;
ro3---psql;
```

With this architecture you're free to do things like use filter processors to limit the size of your database--though
doing this will affect how some Indexer APIs function.
Loading

0 comments on commit 79f1b0e

Please sign in to comment.