Docs: Improve Conduit documentation. (#1371)

algorand · Dec 15, 2022 · 79f1b0e · 79f1b0e
1 parent 727c4a8
commit 79f1b0e
Show file tree

Hide file tree

Showing 8 changed files with 546 additions and 26 deletions.
diff --git a/docs/Conduit.md b/docs/Conduit.md
@@ -13,18 +13,18 @@
 
 # Algorand Conduit
 
-Conduit is a framework which provides reusable components necessary for ingesting blocks from the Algorand blockchain into external applications. It is designed around a modular plugin system that allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.
+Conduit is a framework for ingesting blocks from the Algorand blockchain into external applications. It is designed as modular plugin system that allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.
+
+# Getting Started
+
+See the [Getting Started](conduit/GettingStarted.md) page.
 
 ## Building from source
 
 Development is done using the [Go Programming Language](https://golang.org/), the version is specified in the project's [go.mod](go.mod) file.
 
 Run `make` to build Conduit, the binary is located at `cmd/algorand-indexer/conduit`.
 
-# Quickstart
-
-See the [Getting Started](conduit/GettingStarted.md) page.
-
 # Configuration
 
 See the [Configuration](conduit/Configuration.md) page.
@@ -33,31 +33,30 @@ See the [Configuration](conduit/Configuration.md) page.
 
 See the [Development](conduit/Development.md) page for building a plugin.
 
-# Features
-
 # Plugin System
-A Conduit pipeline is composed of 3 components, [Importers](../conduit/plugins/importers/README.md), [Processors](../conduit/plugins/processors/README.md), and [Exporters](../conduit/plugins/exporters/README.md).
-Every pipeline must define exactly 1 Importer, 1 Exporter, and can optionally define a series of 0 or more Processors.
+A Conduit pipeline is composed of 3 components, [Importers](../conduit/plugins/importers/), [Processors](../conduit/plugins/processors/), and [Exporters](../conduit/plugins/exporters/).
+Every pipeline must define exactly 1 Importer, exactly 1 Exporter, and can optionally define a series of 0 or more Processors.
 
-The original Algorand Indexer has been defined as a Conduit pipeline via the [algorand-indexer](../cmd/algorand-indexer/daemon.go) executable, see [Migrating from Indexer](#migrating-from-indexer)
+The original Algorand Indexer has been defined as a Conduit pipeline via the [algorand-indexer](../cmd/algorand-indexer/daemon.go) executable, see [Migrating from Indexer](#migrating-from-indexer).
 
 # Contributing
 
-Contributions welcome! Please refer to our [CONTRIBUTING](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) document for general contribution guidelines, and individual plugin documentation for contributing to new and existing Conduit plugins.
+Contributions are welcome! Please refer to our [CONTRIBUTING](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) document for general contribution guidelines, and individual plugin documentation for contributing to new and existing Conduit plugins.
 
-## RFCs
+## RFCs (Requests For Comment)
 If you have an idea for how to improve Conduit that would require significant changes, open a [Feature Request Issue](https://github.com/algorand/indexer/issues/new/choose) to begin discussion. If the proposal is accepted, the next step is to define the technical direction and answer implementation questions via a PR containing an [RFC](./rfc/template.md).  
 
-You do _not_ need to open an RFC for adding a new plugin--follow the development guide for the corresponding plugin type.
+You do _not_ need to open an RFC for adding a new plugin--you can open an initial PR requesting feedback on your plugin idea to discuss before implementation if you want.
 
 <!-- USAGE_START_MARKER -->
 
 # Common Setups
 
-The most common usage of Conduit is expecting to be getting validated blocks from a local `algod` Algorand node, adding them to a database (such as [PostgreSQL](https://www.postgresql.org/)), and serving an API to make available a variety of prepared queries. Some users may wish to directly write SQL queries of the database.
+The most common usage of Conduit is to get validated blocks from a local `algod` Algorand node, and adding them to a database (such as [PostgreSQL](https://www.postgresql.org/)).
+Users can separately (outside of Conduit) serve that data via an API to make available a variety of prepared queries--this is what the Algorand Indexer does.
 
 Conduit works by fetching blocks one at a time via the configured Importer, sending the block data through the configured Processors, and terminating block handling via an Exporter (traditionally a database).
-
+For a step-by-step walkthrough of a basic Conduit setup, see [Writing Blocks To Files](./conduit/tutorials/WritingBlocksToFile.md).
 
 <!-- USAGE_END_MARKER_LINE -->
 
@@ -66,3 +65,5 @@ Conduit works by fetching blocks one at a time via the configured Importer, send
 Indexer was built in a way that strongly coupled it to Postgresql, and the defined REST API. We've built Conduit in a way which is backwards compatible with the preexisting Indexer application. Running the `algorand-indexer` binary will use Conduit to construct a pipeline that replicates the Indexer functionality.
 
 Going forward we will continue to maintain the Indexer application, however our main focus will be enabling and optimizing a multitude of use cases through the Conduit pipeline design rather the singular Indexer pipeline.
+
+For a more detailed look at the differences between Conduit and Indexer, see [our migration guide](./conduit/tutorials/IndexerMigration.md).
diff --git a/docs/conduit/GettingStarted.md b/docs/conduit/GettingStarted.md
@@ -1,17 +1,42 @@
 # Getting Started
 
-## Install from Source
 
-1. Checkout the repo, or download the source.
+## Installation
+
+### Install from Source
+
+1. Checkout the repo, or download the source, `git clone https://github.com/algorand/indexer.git && cd indexer`
 2. Run `make conduit`.
 3. The binary is created at `cmd/conduit/conduit`.
 
-## Quick Start
+### Go Install
+
+Go installs of the indexer repo do not currently work because of its use of the `replace` directive to support the 
+go-algorand submodule. 
+
+**In Progress**
+There is ongoing work to remove go-algorand entirely as a dependency of indexer/conduit. Once
+that work is complete users should be able to use `go install` to install binaries for this project.
+
+## Getting Started
+
+Conduit requires a configuration file to set up and run a data pipeline. To generate an initial skeleton for a conduit
+config file, you can run `./conduit init`. This will set up a sample data directory with a config located at
+`data/conduit.yml`.
+
+You will need to manually edit the data in the config file, filling in a valid configuration for conduit to run.  
+You can find a valid config file in [Configuration.md](Configuration.md) or via the `conduit init` command.
+
+Once you have a valid config file in a directory, `config_directory`, launch conduit with `./conduit -d config_directory`.
 
-1. Run `./conduit init` to setup a sample data directory.
-2. Follow the on screen instructions to update the sample configuration in `data/conduit.yml`.
-3. Run `./conduit -d data` to launch conduit with the `data` directory.
 
-# Configuration
+# Configuration and Plugins
+Conduit comes with an initial set of plugins available for use in pipelines. For more information on the possible
+plugins and how to include these plugins in your pipeline's configuration file see [Configuration.md](Configuration.md).
 
-See [Configuration.md](Configuration.md) for more details.
+# Tutorials
+For more detailed guides, walkthroughs, and step by step writeups, take a look at some of our
+[Conduit tutorials](./tutorials). Here are a few of the highlights:
+* [How to write block data to the filesystem](./tutorials/WritingBlocksToFile.md)
+* [A deep dive into the filter processor](./tutorials/FilterDeepDive.md)
+* [The differences and migration paths between Indexer & Conduit](./tutorials/IndexerMigration.md)
diff --git a/docs/conduit/tutorials/FilterDeepDive.md b/docs/conduit/tutorials/FilterDeepDive.md
@@ -0,0 +1,93 @@
+## Filtering Transactions in Conduit
+
+### Intro
+Conduit provides individual documentation for each plugin in [docs/conduit/plugins](./plugins). However, the filter
+processor in particular has a complex set of features which empower users to search for data within Transactions.
+This document will go through some of those features in detail, their use cases, and show some examples.
+
+### Logical Operators
+
+The filter processor provides (at this time) two top level logical operators, `any` and `all`. These are used to match
+"sub-expressions" specified in the filters. For any set of expressions, e1, e2, e3, ... `any(e1,e2,e3,...eN)` will return
+`true` if there exists `eX` for `1 >= X <= N` where `eX` evaulates to `true`,
+and `all(e1,e2,e3,...eN)` will return true if for every `X` from `1..N`, `eX` evaluates to `true`.
+
+In simpler terms, `any` matches the transaction if at least one sub-expression matches, and `all` matches only if every
+sub-expression matches.
+
+### Sub-Expressions
+So, what defines a sub-expression?
+
+The sub-expression consists of 3 components.  
+#### `tag`
+The tag identifies the field to attempt to match. The fields derive their tags according to the
+[official reference docs](https://developer.algorand.org/docs/get-details/transactions/transactions/).
+You can also attempt to match against the `ApplyData`, although this is not officially supported or documented.
+Users interested in this will need to consult the official
+[go-algorand](https://github.com/algorand/go-algorand/blob/master/data/transactions/transaction.go#L104) repository to match tags.
+
+
+For now, we programmatically generate these fields into a map located in the
+[filter package](https://github.com/algorand/indexer/blob/develop/conduit/plugins/processors/filterprocessor/fields/generated_signed_txn_map.go),
+though this is not guaranteed to be the case.
+
+
+Example:
+```yaml
+- tag: 'txn.snd' # Matches the Transaction Sender
+- tag: 'txn.apar.c' # Matches the Clawback address of the asset params
+- tag: 'txn.amt' # Matches the amount of a payment transaction
+```
+
+#### `expression-type`
+The expression type is a selection of one of the available methods for evaluating the expression. The current list of
+types is 
+* `exact`: exact match for string values.
+* `regex`:  applies regex rules to the matching.
+* `less-than` applies numerical less than expression.
+* `less-than-equal` applies numerical less than or equal expression.
+* `greater-than` applies numerical greater than expression.
+* `greater-than-equal` applies numerical greater than or equal expression.
+* `equal` applies numerical equal expression.
+* `not-equal` applies numerical not equal expression.
+
+You must use the proper expression type for the field your tag identifies based on the type of data stored in that field.
+For example, do not use a numerical expression type on a string field such as address.
+
+
+#### `expression`
+The expression is the data against which each field will be compared. This must be compatible with the data type of
+the expected field. For string fields you can also use the `regex` expression type to interpret the input of the
+expression as a regex. 
+
+### Examples
+
+Find transactions w/ fee greater than 1000 microalgos
+```yaml
+- filters:
+  - any:
+    - tag: "txn.fee"
+      expression-type: "greater-than"
+      expression: "1000"
+```
+
+Find state proof transactions
+```yaml
+- filters:
+  - any:
+    - tag: "txn.type"
+      expression-type: "exact"
+      expression: "stpf"
+```
+
+Find transactions calling app, "MYAPPID"
+```yaml
+- filters:
+  - all:
+    - tag: "txn.type"
+      expression-type: "exact"
+      expression: "appl"
+    - tag: "txn.apid"
+      expression-type: "exact"
+      expression: "MYAPPID"
+```
diff --git a/docs/conduit/tutorials/IndexerMigration.md b/docs/conduit/tutorials/IndexerMigration.md
@@ -0,0 +1,132 @@
+## Migrating from Indexer to Conduit
+
+The [Algorand Indexer](https://github.com/algorand/indexer) provides both a block processing pipeline to ingest block
+data from an Algorand node into a Postgresql database, and a rest API which serves that data.
+
+The [Conduit](https://github.com/algorand/indexer/blob/develop/docs/Conduit.md) project provides a modular pipeline
+system allowing users to construct block processing pipelines for a variety of use cases as opposed to the single,
+bespoke Indexer construction.
+
+### Migration
+Talking about a migration from Indexer to Conduit is in some ways difficult because they only have partial overlap in
+their applications. For example, Conduit does _not_ currently include a rest API either for checking pipeline health
+or for serving data from the pipeline. 
+
+Here is the Indexer architecture diagram at a high level. The raw block data is enriched by the account data retrieved
+from the local ledger, and everything is written to Postgresql which can then be queried via the API.
+```mermaid
+graph LR;
+    algod["Algod"]
+    index["Indexer"]
+    ledger["Local Ledger"]
+    psql["Postgresql"]
+    restapi["Rest API"]
+    
+    algod-->index;
+    subgraph "Data Pipeline"
+        index-->ledger;
+        ledger-->index;
+        index-->psql;
+    end
+    psql-->restapi;
+    restapi-->psql;
+```
+
+However, Conduit was built to generalize and modularize a lot of the tasks which Indexer does when ingesting block data
+into its database. For that reason you can swap out the core data pipeline in Indexer with an equivalent Conduit
+pipeline--and that's just what we've done!
+
+```mermaid
+graph LR;
+    algod["Algod"]
+    be["block_evaluator Processor"]
+    pe["postgresql Exporter"]
+    algodimp["algod Importer"]
+    restapi["Rest API"]
+
+    algod-->algodimp
+    subgraph "Conduit Pipeline"
+        algodimp-->be;
+        be-->pe
+    end
+    pe-->restapi;
+```
+
+Using the most recent release of Indexer will create a Conduit pipeline config and launch the pipeline to ingest the
+data used to serve the rest API. Take a look
+[here](https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L359) if you're interested in
+seeing the exact config used in Indexer.
+
+### Adopting Conduit features in your Indexer pipeline
+
+Since Indexer is now using Conduit for its data pipeline, it will benefit from the continued development of the specific
+plugins being used. However, we don't plan on exposing the full set of Conduit features through Indexer. In order to
+start using new features, or new plugins to customize, filter, or further enrich the block data, or even change the
+type of DB used in the backed, you will need to separate Indexer's data pipeline into your own custom Conduit pipeline.
+
+A common deployment of the Indexer might look something like this.
+```mermaid
+graph LR;
+    algod["Alogd"]
+    lb["Load Balancer"]
+    index["Indexer"]
+    ro1["ReadOnly Indexer"]
+    ro2["ReadOnly Indexer"]
+    ro3["ReadOnly Indexer"]
+    psql["Postgresql"]
+
+    algod-->index;
+    index-->psql;
+    lb---index;
+    lb---ro1;
+    ro1---psql;
+    lb---ro2;
+    ro2---psql;
+    lb---ro3;
+    ro3---psql;
+    
+```
+Because the database connection can only tolerate a single writer without having race conditions and/or deadlocks,
+Indexer offers a read-only mode which does not run the data pipeline and has no write access to the database. It's
+common to use the read only mode to scale out the rest API--running multiple web servers behind a load balancer as is
+shown in the diagram.
+
+
+Separating the data pipeline from the Indexer when using this setup is simple--take Indexer's Conduit config
+[shown earlier there](https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L359), write it
+to a file, and launch the Conduit binary. Take a look at the [getting started guide](../GettingStarted.md) for more
+information on installing and running Conduit.
+
+We still plan on supporting the Indexer API alongside Conduit--that means that any changes made to the Postgresql plugin
+will either be backwards compatible with the Indexer API, ando/or have corresponding fixes in Indexer.
+
+Here is our architecture diagram with Conduit as our data pipeline.
+```mermaid
+graph LR;
+    algod["Alogd"]
+    lb["Load Balancer"]
+    ro1["ReadOnly Indexer"]
+    ro2["ReadOnly Indexer"]
+    ro3["ReadOnly Indexer"]
+    psql["Postgresql"]
+    be["block_evaluator Processor"]
+    pe["postgresql Exporter"]
+    algodimp["algod Importer"]
+
+    pe-->psql;
+    algod-->algodimp;
+    subgraph "Conduit Pipeline"
+        algodimp-->be;
+        be-->pe;
+    end
+    lb---ro1;
+    ro1---psql;
+    lb---ro2;
+    ro2---psql;
+    lb---ro3;
+    ro3---psql;
+    
+```
+
+With this architecture you're free to do things like use filter processors to limit the size of your database--though
+doing this will affect how some Indexer APIs function.