You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traverser context builder defines a context strategy (see TraverseContextBuilder.ContextStrategy), which helps to control result during traversal.
In case of post order traversal, the action to set result is invoked once traversal of a current node is complete (hence the name).
On a medium and large graphs this final termination could happen very far from start/root node, the path from start to this node could easily include thousands of intermediate nodes.
Current implementation of NORMAL_STRATEGY propagates result from current context to parent, and then from parent to grandparent, which utilize application call stack.
On a JVM with standard configuration (512K), this could generate StackOverflowError once call stack is exceeded.
In my case, it took 14-15K nodes in a path to cause the error.
There are several ways to mitigate it.
Increase JVM stack size
Use configuration to increase stack size with java -Xss1M (10M, etc). This is usually enough to enable traversal on a graph with millions nodes.
Create custom context strategy
Another way to avoid stack overflow is to handle result differently by utilizing custom context strategy.
This example is for illustration purposes, it assumes default configuration and expects custom strategy. It gives up ability to call strategy per every result, therefore it is not general enough.
Traverser context builder defines a context strategy (see TraverseContextBuilder.ContextStrategy), which helps to control result during traversal.
In case of post order traversal, the action to set result is invoked once traversal of a current node is complete (hence the name).
On a medium and large graphs this final termination could happen very far from start/root node, the path from start to this node could easily include thousands of intermediate nodes.
Current implementation of NORMAL_STRATEGY propagates result from current context to parent, and then from parent to grandparent, which utilize application call stack.
On a JVM with standard configuration (512K), this could generate StackOverflowError once call stack is exceeded.
In my case, it took 14-15K nodes in a path to cause the error.
There are several ways to mitigate it.
Increase JVM stack size
Use configuration to increase stack size with
java -Xss1M
(10M, etc). This is usually enough to enable traversal on a graph with millions nodes.Create custom context strategy
Another way to avoid stack overflow is to handle result differently by utilizing custom context strategy.
This example is for illustration purposes, it assumes default configuration and expects custom strategy. It gives up ability to call strategy per every result, therefore it is not general enough.
Find very unit test to illustrate above in my private gist.
Question to the community
Traverser is designed as a general-purpose library and some limitations and tradeoffs needs to be made.
What is more important for default configuration: ability to traverse large graphs or functionality set? Shall it strive to have both?
PS
This is practical question for anyone, who operates with linked and connected data on a scale of wikidata (~100M "entities").
The text was updated successfully, but these errors were encountered: