Morphisms for a tree
tree-morphisms.RmdPreface
This vignette is a tutorial which walks through implementation of
base functors for an existing tree object (Node, from
data.tree), followed by implementing a custom
Tree type which behaves similarly.
For both types, the tutorial guides building some functions which
make use of the morphisms included in the package. However, as will
become clear, writing unfolds (apo, para,
futu) for the data.tree Node is
problematic. A problem does arise in that when adding child nodes which
are named the same as an existing node in the tree, the behaviour is
unpredictable.
Sample code which produces this behaviour will be included in the
tutorial, and will lead directly on to a custom Tree type
which, while not as fully-featured as the functionality of
data.tree, does behave predictably with duplicate
nodes.
The data.tree Node type
Defining fmap
library (data.tree)
fmap.Node <- function (tr, f) {
working = Clone (tr, attributes=F)
traversal = Traverse (working)
Do (traversal, function(node) node$name <- f (node$name))
return (working)
}Let’s start by implementing fmap for the
data.tree Node object. Confusingly,
referencing a data.tree object with a new variable also
affects the original object. We need to clone the object, which is not
very efficient, but fine for this example, which always returns a new
object without affecting the original.
Defining the “base” functor
NodeF <- function (r, l) {
res = list (type = "NodeF", root=r, forest=l)
class (res) = "NodeF"
return (res)
}
is.NodeF <- function (trf) {
return ("NodeF" %in% class (l))
}
fmap.NodeF <- function (nf, f) {
if (length (nf$forest) == 0)
return (NodeF (nf$root, list()))
else
return (NodeF (nf$root, fmap (nf$forest, f)))
}The “base” functor of a Node object has a
label/attribute type called root and a carrier type called forest. The
fmap function acts on the carrier type, unless the
forest is empty (we’ve hit a ‘leaf’) in which case it acts on the
attribute type.
Translating between the functor and its “base” functor
We need two functions which unwrap and wrap a level of recursion,
respectively. The project function will take a
Node and “project” one layer of it onto the “base” functor,
and the embed function will take a Node “base”
functor and “embed” one layer of it in a tree structure.
The project function for a Node object is
as follows:
project.Node <- function (node)
return (NodeF (node$name, node$children))In contrast, the writing an embed() function ends up
being problematic because of the behaviour of
Node$AddChildNode with duplicated nodes, but assuming that
no duplicates are extant, the following function works:
Conversion to a list of lists and vice versa
The data.tree package includes the functions
ToListSimple and FromListSimple. This example
uses the acme example in the package vignette as it
contains no duplicate nodes, converts it to a list of lists, and then
demonstrates use of cata and ana to convert between a list of lists and
our own Tree type.
Note that these functions assume that there is only one label as in
the vignette example. In data.tree, it is actually possible
to have an arbitrary number of labels, although they aren’t displayed by
the default printing function.
Here is some sample data as given by the data.tree introduction:
library (data.tree)
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
acme
#> levelName
#> 1 Acme Inc.
#> 2 ¦--Accounting
#> 3 ¦ ¦--New Software
#> 4 ¦ °--New Accounting Standards
#> 5 ¦--Research
#> 6 ¦ ¦--New Product Line
#> 7 ¦ °--New Labs
#> 8 °--IT
#> 9 ¦--Outsource
#> 10 ¦--Go agile
#> 11 °--Switch to RNow that we have a list of lists to work with, we can use
ana to build up a Node structure, and
cata to tear it back down:
fromNestedList <- function (lst) {
fromNested = function (nl) {
NodeF (nl$name, nl[-1])
}
ana (fromNested, lst)
}
toNestedList <- function (node) {
toNested = function (nf) {
branch = list (name = nf$root)
return (append (branch, nf$forest))
}
cata (toNested, node)
}We can now combine the two to get the same result as we put in:
acme |> ToListSimple () |> fromNestedList () |> toNestedList () |> FromListSimple ()
#> levelName
#> 1 Acme Inc.
#> 2 ¦--Accounting
#> 3 ¦ ¦--New Software
#> 4 ¦ °--New Accounting Standards
#> 5 ¦--Research
#> 6 ¦ ¦--New Product Line
#> 7 ¦ °--New Labs
#> 8 °--IT
#> 9 ¦--Outsource
#> 10 ¦--Go agile
#> 11 °--Switch to RPretty-printing trees
The following example is based on the one from Stack Overflow, although it generalises to non-binary trees.
drawTree <- function (tree) {
pad = function (body, final, l) {
if (length (l) == 0)
return (list())
else if (length (l) == 1)
return (list (paste (body, l[[1]], sep='')))
else {
firstPortion = l[[1]]
lastPortion = l[2:length(l)]
res = append (list (paste (body, firstPortion, collapse='', sep=''))
, lapply (lastPortion, \(x) paste (final, x, sep='')))
return (res)
}
}
prettyPrintHelper <- function (br) {
if (length (br$forest) == 0)
return (list (br$root))
else
return (append (list (br$root), prettyPrintForest (br)))
}
prettyPrintForest <- function (br) {
forest = br$forest
forlen = length (forest)
if (forlen == 1)
return (pad ("┗━ ", " ", prettyPrintHelper (forest[[1]])))
else {
padding1 = lapply (forest[1:(forlen-1)]
, \(x) pad ("┣━ ", "┃ ", prettyPrintHelper (x)))
padding2 = pad ("┗━ ", " ", prettyPrintHelper (forest[[forlen]]))
return (append (padding1, padding2))
}
}
cat (paste (as.list (unlist (prettyPrintHelper (tree), recursive=F))
, collapse="\n", sep='')
, '\n')
}The main advantage of using cata, as follows, is being
able to reduce recursive calls of prettyPrintForest() and
prettyPrintHelper(), which, confusingly, call each other.
Otherwise, the function which follows is still fairly complicated. Given
that apo gives access to level information, and
histo gives access to past results, it should be feasible
to write a function which makes use of these.
drawTreeCata <- function (tree) {
pad = function (body, final, l) {
if (length (l) == 0)
return (list())
else if (length (l) == 1)
return (list (paste (body, l[[1]], sep='')))
else
return (append (list (paste (body, l[[1]], collapse='', sep=''))
, lapply (l[2:length(l)], \(x) paste (final, x, sep=''))))
}
prettyPrint <- function (br) {
if (length (br$forest) == 0)
return (list (br$root))
else if (length (br$forest) == 1)
return (append (pad ("┗━ ", " ", br$forest), br$root, after = 0))
else {
forest = br$forest
forlen = length (forest)
padding1 = lapply (forest[1:forlen-1], \(x) pad ("┣━ ", "┃ ", x))
padding2 = pad ("┗━ ", " ", forest[[forlen]])
padding = append (padding1, padding2)
return (append (padding, br$root, after = 0))
}
}
res = cata (prettyPrint, tree)
cat (paste (purrr::flatten (res), collapse='\n', sep=''), '\n')
}
acme |> ToListSimple () |> fromNestedList () |> drawTreeCata ()
#> Acme Inc.
#> ┣━ Accounting
#> ┃ ┣━ New Software
#> ┃ ┗━ New Accounting Standards
#> ┣━ Research
#> ┃ ┣━ New Product Line
#> ┃ ┗━ New Labs
#> ┗━ IT
#> ┣━ Outsource
#> ┣━ Go agile
#> ┗━ Switch to RTripping up data.tree
While the above examples work fine, it’s fairly trivial to trip
data.tree up. The expected behaviour is to add another,
identical sub-tree, called “rabbit family”, but the sub-trees end up
getting mixed, or lost.
dups <- Node$new("rabbit warren")
family1 <- dups$AddChild ("rabbit family")
rabbit1.1 <- family1$AddChild ("rabbit matriarch")
rabbit1.2 <- family1$AddChild ("rabbit baby")
rabbit1.3 <- family1$AddChild ("rabbit engineer")
family2 <- dups$AddChild ("rabbit family")
rabbit2.1 <- family2$AddChild ("rabbit matriarch")
rabbit2.2 <- family2$AddChild ("rabbit scout")
rabbit2.3 <- family2$AddChild ("rabbit professor")
family3 <- dups$AddChild ("solitary pair of hares")
hare1 <- family3$AddChild ("solitary hare #1")
family4 <- dups$AddChild ("solitary hare #2")
family5 <- dups$AddChild ("solitary hare #2")
dups
#> levelName
#> 1 rabbit warren
#> 2 ¦--rabbit family
#> 3 ¦ ¦--rabbit matriarch
#> 4 ¦ ¦--rabbit scout
#> 5 ¦ °--rabbit professor
#> 6 ¦--solitary pair of hares
#> 7 ¦ °--solitary hare #1
#> 8 °--solitary hare #2A custom Tree type
Defining a custom Tree type
Since we’re not using an existing type, we need to define functions
which generate the new S3 type Tree, and a
fmap implementation.
Tree <- function (r, l) {
res = list (type = "Tree", root=r, forest=l)
class (res) = "Tree"
return (res)
}
is.Tree <- function (x) {
return ("Tree" %in% class (x))
}
fmap.Tree <- function (tr, f) {
if (length (tr$forest) == 0)
return (Tree (f (tr$root), list()))
else {
newForest = fmap (tr$forest, \(x) fmap.Tree (x, f))
return (Tree (f (tr$root), newForest))
}
}Defining the “base” functor
Now, define TreeF, the “base” functor of
Tree. This is near-identical to NodeF, which
will come in handy later.
TreeF <- function (r, l) {
res = list (type = "TreeF", root=r, forest=l)
class (res) = "TreeF"
return (res)
}
is.TreeF <- function (trf) {
return ("TreeF" %in% class (l))
}
fmap.TreeF <- function (trf, f) {
if (length (trf$forest) == 0)
return (TreeF (trf$root, list()))
else
return (TreeF (trf$root, fmap (trf$forest, f)))
}Redefining the fromNestedList() function
Recall that the “base” functor of our new Tree type is
near-identical to the “base” functor of the data.tree
Node type. Both NodeF and TreeF
are S3 objects which are fundamentally a list with $carrier
and $forest attributes. Therefore, it should be sufficient
just to redefine fromNestedList(), the only change of which
is returning TreeF as opposed to NodeF.
Indeed, it should be possible to call drawTreeCata()
directly on a Tree object.
fromNestedList_ <- function (lst) {
fromNested = function (nl) {
TreeF (nl$name, nl[-1])
}
ana (fromNested, lst)
}
acme |> ToListSimple () |> fromNestedList_ () |> drawTreeCata ()
#> Acme Inc.
#> ┣━ Accounting
#> ┃ ┣━ New Software
#> ┃ ┗━ New Accounting Standards
#> ┣━ Research
#> ┃ ┣━ New Product Line
#> ┃ ┗━ New Labs
#> ┗━ IT
#> ┣━ Outsource
#> ┣━ Go agile
#> ┗━ Switch to RImplementing a monad instance
As a bonus, we can also define a monad instance for the new
Tree type.
bind.Tree <- function (tr, f) {
x = tr$root
txs = tr$forest
k = f(x)
y = k$root
tys = k$forest
p = lapply (txs, \(a) a %>>=% f)
return (Tree (y, c(tys, p)))
}Let’s check the monad laws, which can be summarised as follows:
- Left identity:
return >=> h ≡ h
- Right identity:
f >=> return ≡ f
- Associativity:
(f >=> g) >=> h ≡ f >=> (g >=> h)
First, define the equivalent of Haskell’s return(),
which I’ve called escape:
escape.Tree <- function (x)
return (Tree (x, list()))
f <- function (x)
return (Tree (x+2, list (escape.Tree (x+3), escape.Tree (x+4))))
g <- function (x)
return (Tree (x+4, list ()))Now, implement the laws above in code:
leftIdentityLHS <- escape.Tree (3) %>>=% f
leftIdentityRHS <- f (3)
rightIdentityLHS <- f (3) %>>=% escape.Tree
rightIdentityRHS <- f (3)
associativityTestTree <- Tree (3, list (
Tree (4, list())
, Tree (5, list(Tree (6, list()), Tree (7, list())))
))
associativityLHS <- (associativityTestTree %>>=% f) %>>=% g
associativityRHS <- associativityTestTree %>>=% (\(x) f(x) %>>=% g)Evaluate equivalence by drawing the trees to the terminal:
leftIdentityLHS |> drawTreeCata ()
#> 5
#> ┣━ 6
#> ┗━ 7
leftIdentityRHS |> drawTreeCata ()
#> 5
#> ┣━ 6
#> ┗━ 7
rightIdentityLHS |> drawTreeCata ()
#> 5
#> ┣━ 6
#> ┗━ 7
rightIdentityRHS |> drawTreeCata ()
#> 5
#> ┣━ 6
#> ┗━ 7
associativityLHS |> drawTreeCata ()
#> 9
#> ┣━ 10
#> ┣━ 11
#> ┣━ 10
#> ┃ ┣━ 11
#> ┃ ┗━ 12
#> ┗━ 11
#> ┣━ 12
#> ┣━ 13
#> ┣━ 12
#> ┃ ┣━ 13
#> ┃ ┗━ 14
#> ┗━ 13
#> ┣━ 14
#> ┗━ 15
associativityRHS |> drawTreeCata ()
#> 9
#> ┣━ 10
#> ┣━ 11
#> ┣━ 10
#> ┃ ┣━ 11
#> ┃ ┗━ 12
#> ┗━ 11
#> ┣━ 12
#> ┣━ 13
#> ┣━ 12
#> ┃ ┣━ 13
#> ┃ ┗━ 14
#> ┗━ 13
#> ┣━ 14
#> ┗━ 15It looks like our type satisfies the monad laws. However,
implementing this as a unit test, as for the list type, is non-trivial,
as R has no way to know that two Tree objects are equal.
Writing a function to compare the structure / node values could enable
formally testing equality of the above LHS/RHS trees.
Anyway, implementing bind enables neat operations like the following:
acme_ <- acme |> ToListSimple () |> fromNestedList_ ()
treeToUpper <- purrr::compose (escape.Tree, toupper)
acme_ %>>=% treeToUpper |> drawTreeCata ()
#> ACME INC.
#> ┣━ ACCOUNTING
#> ┃ ┣━ NEW SOFTWARE
#> ┃ ┗━ NEW ACCOUNTING STANDARDS
#> ┣━ RESEARCH
#> ┃ ┣━ NEW PRODUCT LINE
#> ┃ ┗━ NEW LABS
#> ┗━ IT
#> ┣━ OUTSOURCE
#> ┣━ GO AGILE
#> ┗━ SWITCH TO R