Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: delete orphan nodes by traversing trees #641

Closed
wants to merge 4 commits into from

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Dec 5, 2022

Don't need the orphan bookkeepings.

Consequences

Positives

  • don't need to store and maintain the orphan records.
    • smaller db size
    • faster set/delete operations

Negatives

  • The pruning operation should be slower than current approach, especially pruning by range, the theoretical complexity should be the same, O(N) where N is the size of orphaned nodes, but the constant factor is heavier, the new approach need to load the nodes, while the old approach just iterate orphan records and delete nodes by hashes.

Alternative

If it's considered too controversial to downgrade performance of online pruning, an alternative is to just provide an option to not storing the orphan records for archive nodes, who can always do pruning offline using the algorithm described here.

@yihuang yihuang changed the title delete orphan nodes by diff tree feat: delete orphan nodes by diff tree Dec 5, 2022
@yihuang yihuang changed the title feat: delete orphan nodes by diff tree feat: delete orphan nodes by traversing trees Dec 5, 2022
@cool-develope
Copy link
Collaborator

@yihuang , this PR looks like it conflicted with my work. And your approach is not working on the current iavl structure. I think it is possible under the assumption of keeping always the range versions, right?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 6, 2022

@yihuang , this PR looks like it conflicted with my work. And your approach is not working on the current iavl structure. I think it is possible under the assumption of keeping always the range versions, right?

it do works on current iavl structure, I think it's agnostic to node key format, I'm able to do round-trip test on our testnet production db using the python implementation.
What do you mean by range versions?

@cool-develope
Copy link
Collaborator

cool-develope commented Dec 6, 2022

for example, we are keeping the versions of 3, 5, 7, 9.
In that case, you can remove version 5? or if we are keeping the versions of 5 to 10, you can remove the version 8?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 6, 2022

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

@cool-develope
Copy link
Collaborator

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 6, 2022

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)?

yeah, you have a point, if the version 2 is already deleted.

To make it work, we need to check if the version of orphaned nodes still exists, which will further slow down the process. but it's not an issue in the default mode that loads all the versions into memory on startup.

@cool-develope
Copy link
Collaborator

yeah, you have a point.

This is why the current orphans keep fromVersion and toVersion fields, To remove orphans completely, we should assume keeping only the range of versions (for example 1000 - 1100, 1100 is the latest version, 1000 is the oldest available version)

@cool-develope
Copy link
Collaborator

please refer cosmos/cosmos-sdk#12989

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 6, 2022

yeah, you have a point.

This is why the current orphans keep fromVersion and toVersion fields, To remove orphans completely, we should assume keeping only the range of versions (for example 1000 - 1100, 1100 is the latest version, 1000 is the oldest available version)

What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory.

@cool-develope
Copy link
Collaborator

cool-develope commented Dec 6, 2022

What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory.

not sure for me, we already had same issues when load all versions in the archive node, so I am suggesting keep only the first and last version instead of versions map, for pruning it will provide only can delete the first version. Like this way we can delete the oldest versions one by one

@cool-develope
Copy link
Collaborator

anyhow, I am in preparing of the separated PR to remove orphans, I am worried we are on same topic

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 6, 2022

What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory.

not sure for me, we already had same issues when load all versions in the archive node, so I am suggesting keep only the first and last version instead of versions map

The lazy mode?

for pruning it will provide only can delete the first version. Like this way we can delete the oldest versions one by one

delete the first version is a special case in this PR, the previous version will be 0, and it'll simply delete all orphaned nodes.

@cool-develope
Copy link
Collaborator

tbh, the lazy mode is unclear to me. When will trigger the lazy load or function?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 6, 2022

tbh, the lazy mode is unclear to me. When will trigger the lazy load or function?

Whenever you need the root information of a particular version, get from cache or load from db?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 7, 2022

for example, we are keeping the versions of 3, 5, 7, 9. In that case, you can remove version 5?

yes, by diff 5 and 7, delete orphaned nodes whose version > 3.

I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)?

After multiple failures to reproduce the issue, I realize the issue don't exists, the reasoning is like this:

  • If a node is created in version 2 and referenced by version 5, it must also be referenced by all the versions between 2 and 5.
  • so checking the node version against 3 is enough, because 3 must have referenced the node too, and the node will eventually be deleted when version 3 is deleted.

if l.height <= 0 {
panic("already at leaf layer")
}
nodes := make([]*Node, 0, len(l.nodes)*2+len(l.pendingNodes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if need this allocate since it's assign nodes = ... later

Copy link
Collaborator Author

@yihuang yihuang Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean nodes = append(nodes, ...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea seems optional, just not sure why keep capacity when assign

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea seems optional, just not sure why keep capacity when assign

capacity is to reduce the number of reallocations during the append.

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 7, 2022

@yihuang , this PR looks like it conflicted with my work.

Sorry, I don't know about that, I thought you are working for solutions for the new node key format.
Do you think this approach make sense, what's the difference with your plan?

@yihuang yihuang marked this pull request as ready for review December 7, 2022 08:12
@yihuang yihuang requested a review from a team as a code owner December 7, 2022 08:12
@cool-develope
Copy link
Collaborator

Sorry, I don't know about that, I thought you are working for solutions for the new node key format.
Do you think this approach make sense, what's the difference with your plan?

Never mind, I'd like to refactor the diff algorithm.

// Traverse the subtree with a given node as the root.
func (ndb *nodeDB) traverseTree(hash []byte, fn func(node *Node) (bool, error)) error {
	if len(hash) == 0 {
		return nil
	}

	node, err := ndb.GetNode(hash)
	if err != nil {
		return err
	}

	stop, err := fn(node)
	if err != nil || stop {
		return err
	}

	if node.leftHash != nil {
		if err := ndb.traverseTree(node.leftHash, fn); err != nil {
			return err
		}
	}
	if node.rightHash != nil {
		if err := ndb.traverseTree(node.rightHash, fn); err != nil {
			return err
		}
	}

	return nil
}


func (ndb *nodeDB) deleteOrphans(version int64) ([][]byte, error) {
	nRoot, err := ndb.getRoot(version + 1)
	if err != nil {
		return nil, err
	}

	originalNodes := make([]*Node, 0)
	if err := ndb.traverseTree(nRoot, func(node *Node) (bool, error) {
		if node.version > version {
			return false, nil
		}
		originalNodes = append(originalNodes, node)
		return true, nil
	}); err != nil {
		return nil, err
	}

	cRoot, err := ndb.getRoot(version)
	if err != nil {
		return nil, err
	}
	
	index := 0
	if err := ndb.traverseTree(cRoot, func(node *Node) (bool, error) {
		if index < len(originalNodes) && bytes.Equal(node.hash, originalNodes[index].hash) {
			index++
			return true, nil
		}
		if err = ndb.batch.Delete(ndb.nodeKey(node.hash)); err != nil {
			return true, err
		}
		return false, nil
	}); err != nil {
		return nil, err
	}
	return orphans, err
}

Loading all orphans in memory is so expensive, and also your implementation looks complicated.

@cool-develope
Copy link
Collaborator

FYI, removing orphans will impact node key refactoring, this is why I am interested in your PR.

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 7, 2022

Loading all orphans in memory is so expensive, and also your implementation looks complicated.

My algorithm is mainly try to skip the common subtrees early on, so we only need to traverse the branches contains actual differences.

- removed the orphan bookkeepings
yihuang added a commit to yihuang/iavl that referenced this pull request Dec 8, 2022
@yihuang
Copy link
Collaborator Author

yihuang commented Dec 9, 2022

To help understanding the algorithm, i created some visualizations:

The example iavl tree, with versions 4 and 5, the nodes are labaled as $VERSION: $KEY\nheight: $HEIGHT:
example-tree.pdf

The pruning graph demonstrate the result of deleting version 4, it contains all the nodes loaded by the algorithm, the deleted nodes have dotted lines, you can notice there are two nodes coming from version 2 and 3 which are not deleted:
pruning-graph.pdf

Another one done on our production testnet db:
graph-production2.pdf

@tac0turtle
Copy link
Member

curious to hear if there is a performance improvement with this over what exists?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 9, 2022

curious to hear if there is a performance improvement with this over what exists?

The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree.
The main advantages are:

  • db size reduction, don't need to store the orphan records (24% reduction in my test on our testnet rocksdb node).
  • main operations faster, don't need to calculate and insert the orphan records.

@tac0turtle
Copy link
Member

curious to hear if there is a performance improvement with this over what exists?

The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree. The main advantages are:

  • db size reduction, don't need to store the orphan records (24% reduction in my test on our testnet rocksdb node).
  • main operations faster, don't need to calculate and insert the orphan records.

can you provide benchmarks on the slow down. I think there is some margin of slow down for a trade off that could be accepted and with fast node system it shouldn't make a difference

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 9, 2022

curious to hear if there is a performance improvement with this over what exists?

The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree. The main advantages are:

  • db size reduction, don't need to store the orphan records (24% reduction in my test on our testnet rocksdb node).
  • main operations faster, don't need to calculate and insert the orphan records.

can you provide benchmarks on the slow down. I think there is some margin of slow down for a trade off that could be accepted and with fast node system it shouldn't make a difference

sure, I'll do them next week, the basic intuition is it depends by a lot on the node cache, if we are pruning a recent version whose orphaned nodes are still hot in the node cache, then it'll be pretty fast.

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 13, 2022

@tac0turtle @cool-develope I'm closing this PR for following reasons:

  • @cool-develope 's diff algorithm in feat: remove orphans #646 is simpler, I've tried it in python-iavl, both algorithm visits the same number of nodes, but feat: remove orphans #646 is simpler to implement. Although I think there's some issues with tree visit order which I've left comments in the PR.
  • We can already start deleting orphan records now for archive nodes even without this PR, when needed we can always create separate offline tools to do the pruning.
  • we can focus the discussion in feat: remove orphans #646, about the best long term solution.

@yihuang yihuang closed this Dec 13, 2022
@yihuang yihuang deleted the diff-tree branch December 13, 2022 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants