Transform an MTG

A note on anonymous functions

A lot of examples in this tutorial use anonymous functions. These functions are just a way to quickly define a function. For example a function that adds 1 to its input argument would usually be declared as follows:

function plus_one(x)
    x + 1
end

Here we have a name for our function: "plus_one". But sometimes we don't need to name our function because its only usage is to be passed to another function. In this case we can declare an anonymous function like so:

x -> x + 1

This is exactly the same function, but without a name.

Note

We use x here because it is more or less of a standard, but we could use any other argument name. You'll see that we use node instead when referring to an MTG node (node -> node.var), and x when we refer to a node attribute (x -> x + 1).

Introduction to MTG transforming

MTGs can be very large, and it quickly becomes impossible to manually change the attribute values of the nodes.

Instead, you can compute new attributes for all nodes in an MTG using transform!.

The syntax of transform! is very close to the one from DataFrames.jl. It has several forms that allow to perform computations either on the node or the node attributes directly.

Here is a summary of the different forms you can use:

a :var_name => :new_var_name pair. This form is used to rename an attribute name
a :var_name => function => :new_var_name or [:var_name1, :var_name2...] => function => :new_var_name. The variables are declared as a Symbol or a String (or a vector of), and they are passed as positional arguments to function. The new variable name is optional, and is automatically generated if not provided by concatenating the source column name(s) and the function name if any, this form would be used as: :var_name => function.
a function => :new_var_name form that applies a function to a node and puts the results in a new attribute. This form is usually applied when searching ancestors or descendants values.
a function form that applies a mutating function to a node, without expecting any output. This form is used when using a function that already mutates the node, without the need to return anything, e.g. branching_order!.

This tutorial is a deep dive into these different forms.

Note

All examples use the mutating version transform!, but there is a non-mutating version too (transform). It is used likewise but returns a modified copy of the mtg, which is a little bit slower.

Form 1: Rename an attribute

Renaming an attribute in an MTG is very simple. It uses the exact same syntax as DataFrames.jl. First, let's check which attributes are available in the MTG:

get_attributes(mtg)

8-element Vector{Symbol}:
 :description
 :symbols
 :scales
 :XEuler
 :Length
 :Width
 :dateDeath
 :isAlive

Let's rename :Width to remove the capital letter and make it all lowercase:

transform!(mtg, :Width => :width)

Let's check if the attribute name changed:

print(get_attributes(mtg))

[:description, :symbols, :scales, :XEuler, :Length, :dateDeath, :isAlive, :width]

Yes it did!

The equivalent call with the non-mutating version of transform is:

new_mtg = transform(mtg, :Width => :width)

print(get_attributes(new_mtg))

[:description, :symbols, :scales, :XEuler, :Length, :dateDeath, :isAlive, :width]

Form 2: Compute new attributes based on other attributes

We can also compute a new attribute based on another one. For example we could need the length in meters instead of centimetres. To do so, we can compute it as follows:

transform!(mtg, :Length => (x -> x / 10) => :length_m, ignore_nothing = true)

The magic happens in the :Length => (x -> x / 10) => :length_m expression. transform! takes the :Length variable as input (LHS, Left-hand side of the expression), and use it as the argument for the anonymous function given in the middle of the expression: x -> x / 10. Then it puts the output of the function into a new variable named :length_m (RHS, Right-hand side of the expression)

In fewer words, we divide the :Length attribute by 10 for every node in the MTG, and put the results in a new attribute called :length_m.

We use ignore_nothing = true to tell transform! not to process the nodes with a value of nothing for the input variable (:Length). Otherwise our computation would error because the function we use do not handle nothing values well: nothing / 10 returns an error.

Warning

The anonymous function must be surrounded by parenthesis (like in DataFrames.jl)

Let's check if we can find :length_m in the list of our MTG attributes:

print(get_attributes(mtg))

[:description, :symbols, :scales, :XEuler, :Length, :length_m, :dateDeath, :isAlive, :width]

We can also get its values by using descendants on the root node:

descendants(mtg, :length_m)

6-element Vector{Any}:
  nothing
  nothing
 0.01
 0.02
 0.01
 0.02

We can also get the values in the form of a DataFrame instead:

DataFrame(mtg, :length_m)

7×8 DataFrame

Row	tree	id	symbol	scale	index	parent_id	link	length_m
	String?	Int64?	String?	Int64?	Int64?	Int64?	String?	Float64?
1	/ 1: Scene	1	Scene	0	0	missing	/	missing
2	└─ / 2: Individual	2	Individual	1	0	1	/	missing
3	└─ / 3: Axis	3	Axis	2	0	2	/	missing
4	└─ / 4: Internode	4	Internode	3	0	3	/	0.01
5	├─ + 5: Leaf	5	Leaf	3	0	4	+	0.02
6	└─ < 6: Internode	6	Internode	3	1	4	<	0.01
7	└─ + 7: Leaf	7	Leaf	3	0	6	+	0.02

We can also provide several input variables if we need:

transform!(mtg, [:Length, :width] => ((x,y) -> π * x * y^2) => :volume_cm3, ignore_nothing = true)

Here we provide the input attributes as a Vector of Symbols (could be String also), and given them to an anonymous function that takes two arguments as inputs. Our attributes are given to the anonymous function in order, i.e positional arguments. Then we name our new attribute :volume_cm3. Again, we use ignore_nothing = true to remove the nodes with nothing values for the input attributes :Length and :width.

Let's see the results:

DataFrame(mtg, [:Length, :width, :volume_cm3])

7×10 DataFrame

Row	tree	id	symbol	scale	index	parent_id	link	Length	width	volume_cm3
	String?	Int64?	String?	Int64?	Int64?	Int64?	String?	Float64?	Float64?	Float64?
1	/ 1: Scene	1	Scene	0	0	missing	/	missing	missing	missing
2	└─ / 2: Individual	2	Individual	1	0	1	/	missing	missing	missing
3	└─ / 3: Axis	3	Axis	2	0	2	/	missing	missing	missing
4	└─ / 4: Internode	4	Internode	3	0	3	/	0.1	0.02	0.000125664
5	├─ + 5: Leaf	5	Leaf	3	0	4	+	0.2	0.1	0.00628319
6	└─ < 6: Internode	6	Internode	3	1	4	<	0.1	0.02	0.000125664
7	└─ + 7: Leaf	7	Leaf	3	0	6	+	0.2	0.1	0.00628319

The new name of the attribute (the RHS) is optional though. We could write our first example as:

transform!(mtg, :Length => (x -> x / 10), ignore_nothing = true)

In this case the name of the new attribute is automatically computed based on the input variable name and the name of the function. If the function is anonymous, which is the case in our example, it uses the default "function" name instead. Our new variable name is then called :Length_function.

If we used a function with a name such as log instead of an anonymous function, the new attribute name would be :Length_log. Here's an example with the log function:

transform!(mtg, :Length => log, ignore_nothing = true)

print(get_attributes(mtg))

[:Length_function, :volume_cm3, :description, :symbols, :scales, :XEuler, :Length, :length_m, :Length_log, :dateDeath, :isAlive, :width]

Form 3: Compute a new attribute based on node values

We can compute a new attribute by providing a function directly as the right-hand side instead of an attribute name like so:

transform!(mtg, symbol => :Symbol)

The symbol function takes a node as its first (and only) argument, and returns its symbol. An alternative way of writing this would be:

transform!(mtg, node -> symbol(node) => :Symbol)

This particularly useful when we need to compute a new attribute based on the values of the node itself.

Here we just copied the MTG symbol onto the attributes of the nodes. In this form, it is mandatory to provide a name for the newly created variable, else the function is considered to not return anything (see next form: Form 4: Apply a function to nodes).

Because this form expects a function that works on nodes directly, it is now possible to use the descendants and ancestors functions. For example we can compute the total length of the subtree of each node in an MTG (i.e. the length of all children of a node) as follows:

function get_length_descendants(x)
    nodes_lengths = descendants(x, :Length, ignore_nothing = true)
    if length(nodes_lengths) == 0
        return nothing
    else
        return sum(nodes_lengths)
    end
end

transform!(mtg, get_length_descendants => :length_subtree)

descendants(mtg, :length_subtree)

6-element Vector{Any}:
 0.6000000000000001
 0.6000000000000001
 0.5
  nothing
 0.2
  nothing

Note

This form cannot use ignore_nothing = true because it does not know which attributes to look for before-hand. You'll have to use the filter_fun argument or handle nothing values inside your function instead.

Here we first declared a new function to get the length of all descendants of a node (get_length_descendants), and then compute the sum only if one or more values for length were found. Then we pass this function to transform! and define our new attribute name as :length_subtree. We define the function first for clarity because it needs to handle nothing values properly before the call to sum.

An alternative way to write this would be to first get the vector of length for each node, and then to compute the sum like so:

transform!(
    mtg,
    (node -> descendants(node, :Length, ignore_nothing = true)) => :length_subtree2,
    :length_subtree2 => (x -> length(x) == 0 ? nothing : sum(x)) => :length_subtree2
)

Because transform! computes the expressions sequentially, we can re-use a computation from the last expression. This is exactly what we are doing here. First we get the values of the length of all descendants of each node, and put the result in a new attribute :length_subtree2. Then we re-use the data from this attribute to compute its sum, but only if the length of the data is not 0, and put the result back to the same attribute :length_subtree2.

We can test if both calls returns the same output:

all(descendants(mtg, :length_subtree2) .== descendants(mtg, :length_subtree))

true

Yes they are!

Form 4: Apply a function to nodes

We can also apply a function that performs a computation on the node like Form 3, but does not return a new attribute value. For example it can be useful to use a printing function to help us debug another function call. Here's an example where we want to print the id of the nodes that are leaf nodes:

transform!(mtg, node -> isleaf(node) ? println(node_id(node)," is a leaf") : nothing)

5 is a leaf
7 is a leaf

We can also use this form to mutate the MTG of a node (which is not possible with Form 2). Here's an example where we change the "Internode" symbol into "I":

transform!(mtg, node -> symbol!(node, "I"), symbol = "Internode")

mtg

/ 1: Scene
└─ / 2: Individual
   └─ / 3: Axis
      └─ / 4: I
         ├─ + 5: Leaf
         └─ < 6: I
            └─ + 7: Leaf

Note

If you change the values of the MTG field of the nodes, you can update the header of the MTG stored in the root node. For example here we updated the symbols, so we should do:

mtg[:symbols] = get_classes(mtg).SYMBOL
mtg[:description] = get_description(mtg)

Note that it is not important for writing back to disc as they are automatically updated anyway.

Select an MTG

As in DataFrames, MultiScaleTreeGraph.jl provides a select! function for deleting all attributes not explicitly provided as arguments to the selection. The selection can also apply transformations on the fly following the same format used in transform!, with one more Form though: just the name of the variable to select.

For example we can compute the new length in meters, and keep only this result along with the width as follows:

mtg_select = deepcopy(mtg)

select!(mtg_select, :Length => (x -> x / 10) => :length_m, :Width, ignore_nothing = true)

DataFrame(mtg_select)

7×8 DataFrame

Row	tree	id	symbol	scale	index	parent_id	link	length_m
	String?	Int64?	String?	Int64?	Int64?	Int64?	String?	Float64?
1	/ 1: Scene	1	Scene	0	0	missing	/	missing
2	└─ / 2: Individual	2	Individual	1	0	1	/	missing
3	└─ / 3: Axis	3	Axis	2	0	2	/	missing
4	└─ / 4: I	4	I	3	0	3	/	0.01
5	├─ + 5: Leaf	5	Leaf	3	0	4	+	0.02
6	└─ < 6: I	6	I	3	1	4	<	0.01
7	└─ + 7: Leaf	7	Leaf	3	0	6	+	0.02

There is also a non-mutating version of the function:

mtg_select = select(mtg, :Length => (x -> x / 10) => :length_m, :Width, ignore_nothing = true)

DataFrame(mtg_select)

7×8 DataFrame

Row	tree	id	symbol	scale	index	parent_id	link	length_m
	String?	Int64?	String?	Int64?	Int64?	Int64?	String?	Float64?
1	/ 1: Scene	1	Scene	0	0	missing	/	missing
2	└─ / 2: Individual	2	Individual	1	0	1	/	missing
3	└─ / 3: Axis	3	Axis	2	0	2	/	missing
4	└─ / 4: I	4	I	3	0	3	/	0.01
5	├─ + 5: Leaf	5	Leaf	3	0	4	+	0.02
6	└─ < 6: I	6	I	3	1	4	<	0.01
7	└─ + 7: Leaf	7	Leaf	3	0	6	+	0.02

Traverse an MTG

transform! and select! use traverse! under the hood to apply a function call to each node of an MTG. traverse! is just a little bit less easy to use as it only accepts Form 4. We can obtain the exact same results as the last example of transform! using the same call with traverse!. Let's change the Leaf symbol into L:

traverse!(mtg, node -> symbol!(node, "L"), symbol = "Leaf")

mtg

/ 1: Scene
└─ / 2: Individual
   └─ / 3: Axis
      └─ / 4: I
         ├─ + 5: L
         └─ < 6: I
            └─ + 7: L

A benefit of traverse! is it can be used with a do...end block notation for complex sets of instructions:

traverse!(mtg) do node
    if isleaf(node)
         println(node_id(node)," is a leaf")
    end
end

5 is a leaf
7 is a leaf

Mutate an MTG

For users coming from R, we also provide the @mutate_mtg! macro that is similar to transform! but uses a more tidyverse-alike syntax. All values coming from the MTG node must be preceded by a node., as with the .data$ in the tidyverse. The names of the attributes are shortened to just node.attr_name instead of node_attributes(node).attr_name though. Here's an example usage:

@mutate_mtg!(mtg, volume = π * 2 * node.Length, symbol = "I")

We see that we first name the new attribute and assign the result of the computation. Constants are provided as is, and values coming from the nodes are prefixes by node..

Helpers

You can use helper functions provided by MultiScaleTreeGraph.jl for:

Filtering nodes: isroot, isleaf
Compute the number of leaf nodes in the subtree of a node: nleaves
Apply the pipe_model! to the MTG to compute the cross-section of all nodes based on an initial cross-section.

The pipe model is used in plant physiology (especially on trees) and is built around the coarse hypothesis that each leaf in a plant is (to some extent) connected to the roots via a "pipe" of constant cross-sectional area. The concepts of the pipe model are detailed in Lehnebach et al. (2018).

This package provides an implementation of the pipe model, used as follows:

first_cross_section = 0.34 # the initial cross-section of the plant

transform!(mtg, (node -> pipe_model!(node, first_cross_section)) => :cross_section_pipe)
DataFrame(mtg, :cross_section_pipe)

7×8 DataFrame

Row	tree	id	symbol	scale	index	parent_id	link	cross_section_pipe
	String?	Int64?	String?	Int64?	Int64?	Int64?	String?	Float64?
1	/ 1: Scene	1	Scene	0	0	missing	/	0.34
2	└─ / 2: Individual	2	Individual	1	0	1	/	0.34
3	└─ / 3: Axis	3	Axis	2	0	2	/	0.34
4	└─ / 4: I	4	I	3	0	3	/	0.34
5	├─ + 5: L	5	L	3	0	4	+	0.113333
6	└─ < 6: I	6	I	3	1	4	<	0.226667
7	└─ + 7: L	7	L	3	0	6	+	0.226667

For more information about the implementation, you can check the documentation of the function: pipe_model!.

References

R. Lehnebach, R. Beyer, V. Letort, et P. Heuret, « The pipe model theory half a century on: a review », Annals of Botany, vol. 121, nᵒ 5, p. 773‑795, avr. 2018, doi: 10.1093/aob/mcx194.