I am a lambdacamel and therefore I like to adapt concepts and techniques from functional programming, and in particular from the Haskell language, to Raku. One of the techniques that I use a lot is generic traversals, also known as “Scrap Your Boilerplate” after the title of the paper by Simon Peyton Jones and Ralf Lämmel that introduced this approach. In their words:
Many programs traverse data structures built from rich mutually-recursive data types. Such programs often have a great deal of “boilerplate” code that simply walks the structure, hiding a small amount of “real” code that constitutes the reason for the traversal. ”Generic programming” is the umbrella term to describe a wide variety of programming technology directed at this problem.
So to save you having to write your own custom traversal, this approach gives you generic functions that do traversals on arbitrary data strucures. In this article, I will explain how you can easily implement such generics in Raku for arbitrary role-based datastructures. There is no Haskell in this article.
Roles as datatypes by example
I implemented of these generics for use with role-based datatypes. Raku’s parameterised roles make creating complex datastructures very easy. I use the roles purely as datatypes, so they have no associated methods.
For example, here is an example code snippet in a little language that I use in my research.
map (f1 . f2) (map g (zipt (v1,map h v2)))
The primitives are map
, .
(function composition), zipt
and the tuple (...)
, and the names of functions and vectord. The datatype for the abstract syntax of this little language is called Expr
and looks as follows:
# Any expression in the language
role Expr {}
# map f v
role MapV[Expr \f_,Expr \v_] does Expr {
has Expr $.f = f_;
has Expr $.v = v_;
}
# function composition f . g
role Comp[Expr \f_, Expr \g_] does Expr {
has Expr $.f = f_;
has Expr $.g = g_;
}
# zipt t turns a tuple of vectors into a vector of tuples
role ZipT[Expr \t_] does Expr {
has Expr $.t = t_
}
# tuples are just arrays of Expr
role Tuple[Array[Expr] \e_] does Expr {
has Array[Expr] $.e = e_
}
# names of functions and vectors are just string constants
role Name[Str \n_] does Expr {
has Str $.n = n_
}
The Expr
role is the toplevel datatype. It is empty because it is implemented entirely in terms of the other roles, which thanks to the does
are all of type Expr
. And most of the roles have attributes that are also of type Expr
. So we have a recursive datatype, a tree with the Name
node as leaves.
We can now write the abstract syntax tree (AST) of the example code using this Expr
datatype:
my \ast = MapV[
Comp[
Name['f1'].new,
Name['f2'].new
].new,
MapV[
Name['g'].new,
ZipT[
Tuple[
Array[Expr].new(
Name['v1'].new,
MapV[
Name['h'].new,
Name['v2'].new
].new
)
].new
].new
].new
].new;
The typical way to work with such a datastructure is using a given/when
:
sub worker(Expr \expr) {
given expr {
when MapV {...}
when Comp {...}
when ZipT {...}
...
}
}
Alternatively, you can use a multi sub:
multi sub worker(Mapv \expr) {...}
multi sub worker(Comp \expr) {...}
multi sub worker(ZipT \expr) {...}
...
In both cases, we use the roles as the types to match against for the actions we want to take.
(For more details about algebraic datatypes see my earlier article Roles as Algebraic Data Types in Raku.)
Generics
If I want to traverse the AST above, what I would normally do is write a worker as above, where for every node except the leaf nodes, I would call the worker recursively, for example:
sub worker(Expr \expr) {
given expr {
when MapV {
my \f_ = worker(expr.f);
my \v_ = worker(expr.v);
...
}
...
}
}
But wouldn’t it be nice if I did not have to write that code at all? Enter generics.
I base my naming and function arguments on that of the Haskell library Data.Generics
. It provides many schemes for traversals, but the most important ones are everything
and everywhere
.
-
everything
is a function which takes a datastructure, a matching function, an accumulator and an update function for the accumulator. The matching function defines what you are looking for in the datastructure. The result is put into the accumulator using the update function.sub everything( Any \datastructure, Any \accumulator, &joiner, &matcher --> Any){...}
-
everywhere
is a function which takes a datastructure and a modifier function. The modifier function defines which parts of the datastructure you want to modify. The result of the traversal is a modified version of the datastructure.sub everywhere( Any \datastructure, &modifier --> Any){...}
The most common case for the accumulator is to use a list, so the updated function appends lists to the accumulator:
sub append(\acc, \res) {
return (|acc, |res);
}
As an example of a matching function, let’s for example find all the function and vector names in our AST above:
sub matcher(\expr) {
given expr {
when Name {
return [expr.n]
}
}
return []
}
So if we find a Name
node, we return its n
attribute as a single-element list; otherwise we return an empty list.
my \names = everything(ast,[],&append,&matcher);
# => returns (f1 f2 g h v1 v2)
Or let’s say we want to change the names in this AST:
sub modifier(\t) {
given t {
when Name {
Name[t.n~'_updated'].new
}
default {t}
}
}
my \ast_ = everywhere(ast,&modifier);
# => returns the AST with all names appended with "_updated"
Implementing Generics
So how do we implement these magic everything
and everywhere
functions? The problem to solve is that we want to iterate through the attributes of every role without having to name it. The solution for this is to use Raku’s Metaobject protocol (MOP) for introspection. In practice, we use the Rakudo-specific Metamodel
. We need only three methods: attribute
, get_value
and set_value
. With these, we can iterate through the attributes and visit them recursively.
Attributes can be $
, @
or %
(and even &
but I will skip this). What this means in terms of Raku’s type system is that they can be scalar, Iterable or Associative, and we need to distinguish these cases. With that, we can write everything
as follows:
sub everything (\t, \acc,&update,&match) {
# Arguments a immutable, so copy to $acc_
my $acc_ = acc;
# Match and update $acc_
$acc_ =update($acc_,match(t));
# Test the attribute type
if t ~~ Associative {
# Iterate over the values
for t.values -> \t_elt {
$acc_ = everything(t_elt,$acc_,&update,&match)
}
return $acc_;
}
elsif t ~~ Iterable {
# Iterate
for |t -> \t_elt {
$acc_ = everything(t_elt,$acc_,&update,&match)
}
return $acc_;
}
else {
# Go through all attributes
for t.^attributes -> \attr {
# Not everyting return by ^attributes
# is of type Attribute
if attr ~~ Attribute {
# Get the attribute value
my \expr = attr.get_value(t);
if not expr ~~ Any { # for ContainerDescriptor::Untyped
return $acc_;
}
# Descend into this expression
$acc_ = everything(expr,$acc_,&update, &match);
}
}
}
return $acc_
}
So what we do here essentially is:
- for
@
and%
we iterate through the values - iterate through the attributes using
^attributes
- for each attribute, get the expression using
get_value
- call
everything
on that expression - the first thing
everything
does is update the accumulator
everywhere
is similar:
sub everywhere (\t_,&modifier) {
# Modify the node
my \t = modifier(t_);
# Test the type for Iterable or Associative
if t ~~ Associative {
# Build the updated map
my %t_;
for t.keys -> \t_k {
my \t_v = t{t_k};
%t_{t_k} = everywhere (t_v,&modifier);
}
return %t_;
}
elsif t ~~ Iterable {
# Build the updated list
my @t_=[];
for |t -> \t_elt {
@t_.push( everywhere(t_elt,&modifier) );
}
return @t_;
}
else {
# t is immutable so copyto $t_
my $t_ = t;
for t.^attributes -> \attr {
if attr ~~ Attribute {
my \expr = attr.get_value(t);
if not expr ~~ Any { # for ContainerDescriptor::Untyped
return $t_;
}
my \expr_ = everywhere(expr,&modifier);
attr.set_value($t_,expr_);
}
}
return $t_;
}
return t;
}
So what we do here essentially is:
- for
@
and%
we iterate through the values - iterate through the attributes using
^attributes
- for each attribute, get the expression using
get_value
- call
everywhere
on that expression - update the attribute using
set_value
This works without roles too
First of all, the above works for classes too, because the Metamodel methods are not specific to roles. Furthermore, because we test for @
and %
, the generics above work just fine for data structures without roles, built from hashes and arrays:
my \lst = [1,[2,3,4,[5,6,7]],[8,9,[10,11,[12]]]];
sub matcher (\expr) {
given expr {
when List {
if expr[0] % 2 == 0 {
return [expr]
}
}
}
return []
}
my \res = everything(lst,[],&append,matcher);
say res;
# ([2 3 4 [5 6 7]] [8 9 [10 11 [12]]] [10 11 [12]] [12])
Or for hashes:
my %hsh =
a => {
b => {
c => 1,
a => {
b =>1,c=>2
}
},
c => {
a =>3
}
},
b => 4,
c => {d=>5,e=>6}
;
sub hmatcher (\expr) {
given (expr) {
when Map {
my $acc=[];
for expr.keys -> \k {
if k eq 'a' {
$acc.push(expr{k})
}
}
return $acc;
}
}
return []
}
my \hres = everything(%hsh,[],&append,&hmatcher);
say hres;
# ({b => {a => {b => 1, c => 2}, c => 1}, c => {a => 3}} {b => 1, c => 2} 3)
Conclusion
Generic datastructure traversals are a great way to reduce boilerplate code and focus on the actual purpose of the traversals. And now you can have them in Raku too. I have shown the implementation for the two main schemes everything
and everywhere
and shown that they work for role based datastructures as well as traditional hash- or array-based datastructures.