Removal of Marlowe Runtime's intermediate Chain Indexer #768

jhbertra · 2023-11-29T22:06:06Z

jhbertra
Nov 29, 2023

Overview

This discussion aims to establish a plan to remove the marlowe-chain-sync and marlowe-chain-indexer from the Marlowe Runtime. These components account for a significant portion of resource consumption by the Runtime, limiting its scalability and impairing its performance. By analyzing the consumption patterns of downstream components, we can eliminate the need for these components without compromising the architectural flexibility they provide.

Benefits provided

The main benefit provided by the chain index is flexibility. Without it, components like marlowe-sync and marlowe-tx need to communicate directly with a node to index the information they need. This can take quite a long time when starting from genesis, and if these components change in such a way that they require new information, the whole process must start from scratch. By acting as a runtime-wide cache of block and transaction data, the chain index makes this far less expensive, and it is comparatively inexpensive to traverse.

This was very valuable in the early stages of Runtime development because these sorts of changes happened quite frequently. However, as the Runtime matures and stabilizes, they happen less often. It is also worth noting that this problem is still present if the chain index doesn't contain information that we will need at some point - notably scripts.

Costs

The cost of maintaining the chain index is high. It demands significant disk space (for mainnet, this is on the order of hundreds of gigabytes) from the host environment, which hurts scalability. Query performance can also be problematic due to the size of the databases. Both of these problems result from keeping far more data than the Runtime needs to function as a hedge against new information being required in the future. As stability and confidence in data requirements grow, these costs outweigh the benefits. The requirements proposed here show that we have reached the point where this tradeoff is no longer worth the cost.

What do we actually need?

there are two types of information we need from the chain: historical and current (i.e. UTxO). The scopes of these two categories are quite different. Historical information is only needed by marlowe-sync and its scope is limited to Marlowe contract and payout withdrawal transactions. UTxO information is needed by marlowe-tx to build transactions.

Some key observations to make here are:

History currently accounts for the vast majority of space requirements of the current chain index
Marlowe contracts comprise a comparatively insignificant portion of overall chain size.
Historical information requirements are limited to Marlowe contracts only and are only needed by marlowe-sync
UTxO information is easy to query from the node
UTxO size is small compared to the overall chain size
UTxO information is required by marlowe-tx only

We can leverage these observations to find an optimal solution:

Index all Marlowe transactions (this is currently handled by marlowe-indexer)
Index UTxO (TxIns only, not whole outputs) by address and contractID

Indexing history

Not much needs to change from how this currently happens in marlowe-indexer. We can switch to indexing this information directly from a chain sync follower. In future, when we add additional information, we will need to reindex from genesis, but this should happen infrequently enough to be worthwhile.

Indexing the UTxO

The UTxO index can be built by a new chain follower component for marlowe-tx. The UTxO index only needs to map certain lookup keys to TxIns (in a format that can be rolled back up to k blocks). It does not need to maintain the full output data because if we have the corresponding TxIds, it is efficient to query directly from the node.

The lookups we need to maintain are:

What are the current outputs for a given address?
What is the current output for a given contract ID?
What are the current helper script outputs for a given contract ID?

Given these lookups, we can build the UTxO required for:

Creation transactions (requires UTxO from wallet addresses only)
Apply inputs transactions (requires UTxO from wallet, current marlowe output, and current helper outputs)
Withdraw transaction (requires UTxO from wallet; payouts are explicitly provided in request)

bwbush · 2023-12-05T12:52:50Z

bwbush
Dec 5, 2023
Collaborator

The UTxO index can be built by a new chain follower component for marlowe-tx.

The node already supports UTxO queries. Why do we need a new follower for the UTxO index?

The node doesn't provide "2. What is the current output for a given contract ID?", but that information already resides in the marlowe schema.
Item "3. What are the current helper script outputs for a given contract ID?" could also reside in marlowe.

2 replies

jhbertra Dec 5, 2023
Author

The UTxO index can be built by a new chain follower component for marlowe-tx.

The node already supports UTxO queries. Why do we need a new follower for the UTxO index?
For performance - querying by address is linear, while querying by TxIn is logarithmic. However, if we can derive all the script-related from the marlowe schema, then this benefit likely pales in comparison to the benefit of not needing to write an additional indexer.

This could be a case where we build it if and when we need it.

* The node doesn't provide "2. What is the current output for a given contract ID?", but that information already resides in the `marlowe` schema.

Yes, this is true. We will need to be careful to drive all marlowe-tx transaction detection from marlowe-sync in that case, or we risk encountering the issue we faced a while ago where we had inconsistent sources of truth.

* Item "3. What are the current helper script outputs for a given contract ID?" could also reside in `marlowe`.

Yes, this can be done.

paluh Jan 20, 2024
Maintainer

I think that we can focus on our simple use cases and track these subsets of UTxOs:

UTxO set which is related to Marlowe contract threads.
UTxO set which contains role tokens related to active Marlowe contracts.
We can skeep UTxO set at addresses and either rely on the context provide by wallet through CIP-30 getUtxos or dynamically query the node.
I would narrow tracking of UTxO holding roles for now only to our known policy tokens and require minting during the initial transaction.
We can also consider a configuration option for extra scripts which should be indexed and tracked (UTxO with roles with these scripts and reference scripts for them).

This should cover most of our basic requirements and extra needs like role locking scripts with pretty lightweight resource footprint.

bwbush · 2023-12-05T13:03:59Z

bwbush
Dec 5, 2023
Collaborator

Query performance can also be problematic due to the size of the databases.

I don't think we have concrete evidence of this.

It's not clear to me what the fundamental cause of slow wallet queries is, and we should fully understand why, so that the lessons learned can be incorporated into the solution. Yes, I have observed slow transaction-building on mainnet due to a series of several queries that take a couple of minutes each, but this could be the result of the client's or MarloweTx's inefficient orchestration of transaction building, not the queries themselves. Or it could be that I observed this on a wallet with a pathologically complex history and contents.
We know that performance depends extremely strongly on the underlying hardware. For example, the sync time for mainnet varies from 6 hours to 10 days, depending upon disk bandwidth and other factors. Similarly for queries.
The upcoming benchmarking work might provide more insight into bottlenecks and performance targets.
Who are the deployers of this service, and what are their hardware constraints?
In contrast to the query performance, the sync performance of marlowe-chain-indexer is significantly better than all other general-purpose Cardano chain indexers. Do we want to give that up?

1 reply

jhbertra Dec 5, 2023
Author

I didn't mean to make performance seem like the main concern - it isn't. The primary drawbacks the chain indexer imposes are:

It is extremely wasteful. We do not need to index the whole chain; doing so increases our disk space footprint by orders of magnitude.
It adds unnecessary complexity to the architecture.

These two problems are reason enough to suggest that removing it is a step in the right direction.

In contrast to the query performance, the sync performance of marlowe-chain-indexer is significantly better than all other general-purpose Cardano chain indexers. Do we want to give that up?

This doesn't intrinsically create value for us - our product is not a general-purpose chain indexer. If the Runtime can operate more efficiently with a purpose-built, direct indexer instead of a general-purpose one, the fact that marlowe-chain-indexer is faster than the other solutions isn't relevant.

bwbush · 2023-12-05T13:18:38Z

bwbush
Dec 5, 2023
Collaborator

Overall, I think we should take a product- and evidence-based approach to this and reframe the question more radically. Here are a couple of alternatives for consideration.

Abandon PostgreSQL in favor of a column store, a NoSQL database, or an in-memory database. This will escape the performance issues arising from PostgreSQL (e.g., single-threaded queries).
Replace the existing indexers with Marconi or another existing solution. This will drastically lower maintenance and documentation costs.
Create a distributed index that is collaboratively indexed, perhaps based on a p2p technology like IPLD.
Meet with the node team to learn how they are improving node performance with high efficiency indexes, etc.
Retain marlowe-chain-indexer as an enterprise solution and develop a pure clientside, database-free solution (i.e., let the wallet and blockfrost provide the information).

The preponderance of the limited user-derived evidence we have is that the last item above is their preference.

1 reply

jhbertra Dec 5, 2023
Author

Considering this from a broader perspective is important, but so are the incremental steps to get from where we are to where we want to be. From where we are right now, removing the chain indexer is an improvement we can make immediately that will dramatically lower maintenance burdens for the reasons I outlined above. In fact, it is a necessary first step for some of the solutions you suggested:

1. Abandon PostgreSQL in favor of a column store, a NoSQL database, or an in-memory database. This will escape the performance issues arising from PostgreSQL (e.g., single-threaded queries).

This could be considered separately for the marlowe schema, though I can't see why this would be any better.

2. Replace the existing indexers with Marconi or another existing solution. This will drastically lower maintenance and documentation costs.

Again, this could be considered for the marlowe schema. Removing marlowe-chain-indexer would actually be a necessary first step in this direction, as was previously discussed with the Plutus tools team.

3. Create a distributed index that is collaboratively indexed, perhaps based on a p2p technology like IPLD.

This is yet another solution we could consider for the marlowe schema - albeit a bit more radical. We'd have to consider the tradeoff of requiring hosts to integrate with IPLD to run the runtime.

4. Meet with the node team to learn how they are improving node performance with high efficiency indexes, etc.

This doesn't seem like an alternative to removing the chain indexer, unless they will start supporting all the queries we need out-of-the-box.

5. Retain `marlowe-chain-indexer` as an enterprise solution and develop a pure clientside, database-free solution (i.e., let the wallet and blockfrost provide the information).

We could offer marlowe-chain-sync as a standalone SAAS product or an enterprise solution; it does provide numerous benefits over competing solutions. However, this seems like a major distraction from our core value proposition.

Out of curiosity, what would a pure client-side solution look like? The main difficulty I foresee is discovering, identifying and maintaining Marlowe contract history. We wouldn't want to index every Marlowe contract in such a solution.

bwbush · 2023-12-05T13:56:14Z

bwbush
Dec 5, 2023
Collaborator

The most complex example of a helper script is the Charli3 oracle bridge for Marlowe. It requires information about Datum and UTXOs for non-Marlowe scripts (not just for helper scripts) and querying reference UTXOs.

Should the revised indexing solution support such queries? or would deployers of such contracts need to use other indexing solutions?

1 reply

jhbertra Dec 5, 2023
Author

We may not want to build support for this directly into the runtime. Adding open roles showed that this approach won't scale well, and it is too limiting. I think the way forward for these helper scripts is to switch to extensible transaction constraints, in which case the deployers of such contracts would be responsible for indexing the necessary data themselves.

bwbush · 2023-12-20T14:29:47Z

bwbush
Dec 20, 2023
Collaborator

Given the ever-larger resource footprint of marlowe-chain-indexer on mainnet (now 10.5 GB), I do think we should try to retire this service.

@palas?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removal of Marlowe Runtime's intermediate Chain Indexer #768

{{title}}

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Removal of Marlowe Runtime's intermediate Chain Indexer #768

jhbertra Nov 29, 2023

Overview

Benefits provided

Costs

What do we actually need?

Indexing history

Indexing the UTxO

Replies: 5 comments · 5 replies

bwbush Dec 5, 2023 Collaborator

jhbertra Dec 5, 2023 Author

paluh Jan 20, 2024 Maintainer

bwbush Dec 5, 2023 Collaborator

jhbertra Dec 5, 2023 Author

bwbush Dec 5, 2023 Collaborator

jhbertra Dec 5, 2023 Author

bwbush Dec 5, 2023 Collaborator

jhbertra Dec 5, 2023 Author

bwbush Dec 20, 2023 Collaborator

jhbertra
Nov 29, 2023

Replies: 5 comments 5 replies

bwbush
Dec 5, 2023
Collaborator

jhbertra Dec 5, 2023
Author

paluh Jan 20, 2024
Maintainer

bwbush
Dec 5, 2023
Collaborator

jhbertra Dec 5, 2023
Author

bwbush
Dec 5, 2023
Collaborator

jhbertra Dec 5, 2023
Author

bwbush
Dec 5, 2023
Collaborator

jhbertra Dec 5, 2023
Author

bwbush
Dec 20, 2023
Collaborator