Performance Considerations

Introduction

This tutorial covers some of the performance considerations when using MultiScaleTreeGraph.jl. It is not meant to be a comprehensive guide, but rather a starting point for users to understand the performance implications of their code.

It is important to note that MultiScaleTreeGraph.jl is high-performance by design. The package is designed to be as fast as possible considering the most common use cases. However, there are some things that the user can do to improve performance.

Performance Tips

Attribute types

The type used to store the attributes in the nodes are completely free, and can have a significant impact on performance. By default, the attributes are stored in a Dict{Symbol, Any}. This is a very flexible type that allows for getting attributes by name, updating their values, and adding or deleting attributes. However, this flexibility comes at a cost. The Dict type is fast but slower than more optimized types.

If the user just wants to read an MTG and extract information from it, but not update values or add new attributes, it is recommended to use a NamedTuple instead. This will improve performance significantly.

If the user still want to update values, but does not need to repeatedly add or delete attributes, it is recommended to use a MutableNamedTuple. This will improve performance significantly over Dicts.

MTG encoding

The MTG encoding is the type used to store the MTG information about the node, i.e. the scale, index, symbol and link.

By default, MultiScaleTreeGraph.jl uses a mutable encoding (MutableNodeMTG), which allows for modifying this information. However, if the user does not need to modify these, it is recommended to use an immutable encoding instead (NodeMTG). This will improve performance significantly.

Traversal: node caching

MultiScaleTreeGraph.jl traverses all nodes by default when performing tree traversal. The traversal is done in a recursive manner so it is performant, but not always as fast as it could be. For example, we could have a very large tree with only two leaves at the top. In this case, we would traverse all nodes in the tree, even though we only need to traverse two nodes.

To improve performance, it is possible to cache any type of traversal, including any kind of filter on the nodes, and then use the cached traversal instead of the default one. This will improve performance significantly.

Note

A cache is a data structure that stores the result of a computation so that it can be reused later. In this case, the cache stores a pointer to the nodes from the traversal so that it can be efficiently reused later. This is a common technique to improve performance at the cost of memory, though the memory cost is usually very small.

To cache a traversal, you can use cache_nodes!. For example, if you want to cache all the leaf nodes in the MTG, you can do:

cache_nodes!(mtg, symbol = "Leaf")

This will cache all the nodes with the symbol "Leaf" in the MTG. Then, the tree traversal functions will use the cached traversal to iterate over the nodes.

Tip

Tree traversal is very fast, so caching nodes is not always necessary. Caching should be used when the traversal is needed multiple times, and the traversal is sparse, i.e. a lot of nodes are filtered-out.

Traversal: descendants values caching

Similarly to caching nodes during tree traversal, the mutating version of descendants -descendants!- provides a way to cache the values from the descendants of a node. This is useful when the descendants of a node are needed multiple times, as it avoids traversing the tree multiple times. For example, this is useful when computing the total biomass of all wood each segment supports in a tree, as the biomass of a node is the sum of the biomass of its descendants.