Performance Considerations
Introduction
This tutorial covers some of the performance considerations when using MultiScaleTreeGraph.jl. It is not meant to be a comprehensive guide, but rather a starting point for users to understand the performance implications of their code.
It is important to note that MultiScaleTreeGraph.jl is high-performance by design. The package is designed to be as fast as possible considering the most common use cases. However, there are some things that the user can do to improve performance.
Performance Tips
Vocabulary
This page uses a few technical words. Here is what they mean in MTG terms:
- Traversal: visiting nodes one after another in the graph.
- Request: asking the MTG for data, for example
descendants(mtg, :Length)orancestors(node, :Width). - Direct traversal: reading the graph by following parent/children links directly.
- Indexed traversal: using a precomputed lookup table to answer some descendant requests faster.
- DFS (Depth-First Search): one way to visit nodes; it goes deep into one branch before moving to the next branch.
Attribute backend
By default, read_mtg uses ColumnarStore, a typed per-symbol columnar backend. This is optimized for traversal + attribute extraction workloads and reduces repeated dictionary lookups in hot loops.
The explicit attribute API is:
attribute(node, :key; default=nothing)attribute!(node, :key, value)attributes(node, format=:namedtuple | :dict)add_column!,drop_column!,rename_column!for schema updates
read_mtg always uses the typed columnar backend. If you build nodes manually, you can still pass Dict/NamedTuple values, and they are converted automatically to columnar attributes.
MTG encoding
The MTG encoding is the type used to store the MTG information about the node, i.e. the scale, index, symbol and link.
By default, MultiScaleTreeGraph.jl uses a mutable encoding (MutableNodeMTG), which allows for modifying this information. However, if the user does not need to modify these, it is recommended to use an immutable encoding instead (NodeMTG). This will improve performance significantly.
The internal representation of MTG symbol and link values is based on Symbol for faster comparisons and lower repeated allocations. For backward compatibility, string inputs are still accepted everywhere (constructors and filters), but using symbols in performance-critical code is recommended:
NodeMTG(:/, :Internode, 1, 2)
traverse(mtg, x -> x, symbol=:Internode, link=:<)Traversal: node caching
MultiScaleTreeGraph.jl visits all nodes by default when traversing a tree. This is usually fast, but not always optimal. For example, in a very large tree with only two leaves of interest, we still visit every node even though we only need those two leaves.
To improve performance, it is possible to cache any type of traversal, including any kind of filter on the nodes, and then use the cached traversal instead of the default one. This will improve performance significantly.
A cache is simply a saved result that can be reused later. Here, it saves references to the nodes selected by a traversal so the package does not need to re-scan the full tree each time. This usually improves speed at the cost of a small amount of extra memory.
To cache a traversal, you can use cache_nodes!. For example, if you want to cache all the leaf nodes in the MTG, you can do:
cache_nodes!(mtg, symbol = :Leaf)This will cache all the nodes with the symbol :Leaf in the MTG. Then, the tree traversal functions will use the cached traversal to iterate over the nodes.
Tree traversal is very fast, so caching nodes is not always necessary. Caching should be used when the traversal is needed multiple times, and the traversal is sparse, i.e. a lot of nodes are filtered-out.
Traversal: descendants values caching
Similarly to caching nodes during tree traversal, descendants! (the ! means "update in place") provides a way to cache values from descendants of a node. This is useful when these values are needed many times, because it avoids traversing the tree repeatedly. For example, this is useful when computing total biomass supported by each segment.
In-place traversal outputs
For repeated requests on large trees, prefer reusing buffers with in-place methods to reduce allocations:
vals = Float64[]
nodes = typeof(mtg)[]
descendants!(vals, mtg, :Length)
ancestors!(vals, get_node(mtg, 5), :Length)
descendants!(nodes, mtg, self=true)This pattern is especially useful when the same request is executed many times (e.g. across millions of leaves).
type= for descendants/ancestors is deprecated because return eltypes are inferred automatically from typed columns.
Hybrid descendants backend (growth-safe)
For descendants(node, key, ...) on columnar MTGs, the package supports:
:auto(default): use direct graph traversal when the MTG is changing a lot, and switch to indexed traversal when read requests dominate:pointer: always use direct graph traversal:indexed: always use the index (rebuilding it when structure changed)
In other words:
- If your simulation is currently growing a lot (many inserted/deleted organs),
:pointerbehavior is usually best. - If your simulation is mostly reading values repeatedly from a mostly stable structure,
:indexedbehavior can be faster. :autotries to choose between both automatically.
descendants_strategy(mtg) # :auto
descendants_strategy!(mtg, :pointer)
descendants_strategy!(mtg, :indexed)
descendants_strategy!(mtg, :auto)Typical mixed workflow:
# Growth phase (many structural mutations):
descendants_strategy!(mtg, :pointer)
# Analysis phase (many repeated descendant requests):
descendants_strategy!(mtg, :auto) # or :indexed if structure is stable