In the previous article I explained the basic idea behind a technique called Böhm-Berarducci encoding of algebraic data types, and showed a way to implement this technique in Raku. Unless you are already familiar with this formalism, I recommend you read that article first.
In this article I want to illustrate how the Böhm-Berarducci (BB) encoding of a data structure based on algebraic data types can be considered as a universal interpreter. What this means is that it is easy to perform computations that turn the data structure into something else. As an example, I will demonstrate how to create an evaluator and pretty-printer for a parsed polynomial expression.
A parse tree type
Consider expressions of the form a*x^2+b*x+c
or x^3+1
or x*y^2-x^2*y
. Let’s assume we have a parser for such an expression, for example built using parser combinators. Let’s also assume that this parser returns the parsed data as an algebraic data type, defined in Haskell as:
data Term =
Var String
| Par String
| Const Int
| Pow Term Int
| Add [Term]
| Mult [Term]
and in Raku:
role Term {}
role Var [Str \v] does Term {
has Str $.var = v;
}
role Par [Str \p] does Term {
has Str $.par = p;
}
role Const [Int \c] does Term {
has Int $.const = c;
}
role Pow [Term \t, Int \n] does Term {
has Term $.term = t;
has Int $.exp = n;
}
role Add [Array[Term] \ts] does Term {
has Array[Term] $.terms = ts;
}
role Mult [Array[Term] \ts] does Term {
has Array[Term] $.terms = ts;
}
The additional complexity compared to the types discussed in the previous article is that this type is recursive: the Pow
, Add
and Mult
roles take parameters of type Term
.
Before we look at the BB encoding, let’s first write a pretty-printer for this type, using recursive multi sub
s.
# Pretty-print a Term
multi sub ppTerm(Var \t) { t.var }
multi sub ppTerm(Par \c) { c.par }
multi sub ppTerm(Const \n) { "{n.const}" }
multi sub ppTerm(Pow \pw){
ppTerm(pw.term) ~ '^' ~ "{pw.exp}"
}
multi sub ppTerm(Add \t) {
my @pts = map {ppTerm($_)}, |t.terms;
"("~join( " + ", @pts)~")"
}
multi sub ppTerm(Mult \t){
my @pts = map {ppTerm($_)}, |t.terms;
join( " * ", @pts)
}
In the same way we can write an evaluator for this type:
# Evaluate a Term
multi sub evalTerm(%vars, %pars, Var \t) { %vars{t.var} }
multi sub evalTerm(%vars, %pars,Par \c) { %pars{c.par} }
multi sub evalTerm(%vars, %pars,Const \n) { n.const }
multi sub evalTerm(%vars, %pars,Pow \pw){
evalTerm(%vars, %pars,pw.term) ** pw.exp
}
multi sub evalTerm(%vars, %pars,Add \t) {
my @pts = map {evalTerm(%vars, %pars,$_)}, |t.terms;
[+] @pts
}
multi sub evalTerm(%vars, %pars,Mult \t){
my @pts = map {evalTerm(%vars, %pars,$_)}, |t.terms;
[*] @pts
}
Example parse trees
As an example, let’s create the parse tree for a few expressions using the Term
type.
# a*x^2 + b*x + x
my \qterm1 = Add[
Array[Term].new(
Mult[ Array[Term].new(
Par[ "a"].new,
Pow[ Var[ "x"].new, 2].new)
].new,
Mult[
Array[Term].new(
Par[ "b"].new,
Var[ "x"].new)
].new,
Par[ "c"].new
)
].new;
# x^3 + 1
my \qterm2 = Add[
Array[Term].new(
Pow[ Var[ "x"].new, 3].new,
Const[ 1].new
)
].new;
# qterm1 * qterm2
my \qterm = Mult[
Array[Term].new(
qterm1, qterm2
)
].new;
Calling the pretty-printer and evaluator on this term:
say ppTerm( qterm); # => (a * x^2 + b * x + c) * (x^3 + 1)
say evalTerm(
{"x" => 2}, {"a" =>2,"b"=>3,"c"=>4}, qterm
); # => 162
BB encoding of the parse tree type
The BB encoding of the Term
algebraic data type in Raku is pleasingly compact:
role TermBB[&f] {
method unTermBB(
&var:(Str --> Any),
&par:(Str --> Any),
&const:(Int --> Any),
&pow:(Any,Int --> Any),
&add:(Array[Any] --> Any),
&mult:(Array[Any] --> Any)
--> Any
) {
f(&var,&par,&const,&pow,&add,&mult);
}
}
It would of course be even more compact without the signatures, but then we’d have no information about the encoded type.
We could of course use this type directly, but instead I want to look at how we can convert between Term
and TermBB
.
As before, we create our little helpers. Each of the functions below is a constructor which generates the TermBB
instance for the corresponding alternative in the Term
algebraic data type. (When Raku’s macro language is more developed, we will be able to generate these automatically.)
sub VarBB(Str \s --> TermBB) {
TermBB[
sub (\v, \c, \n, \p, \a, \m) { v.(s) }
].new;
}
sub ParBB(Str \s --> TermBB) {
TermBB[
sub (\v, \c, \n, \p, \a, \m) { c.(s) }
].new;
}
sub ConstBB(Int \i --> TermBB) {
TermBB[
sub (\v, \c, \n, \p, \a, \m) { n.(i) }
].new;
}
sub PowBB( TermBB \t, Int \i --> TermBB) {
TermBB[ sub (\v, \c, \n, \p, \a, \m) {
p.( t.unTermBB( v, c, n, p, a, m ), i);
}
].new;
}
sub AddB( Array[TermBB] \ts --> TermBB) {
TermBB[ sub (\v, \c, \n, \p, \a, \m) {
a.( map {.unTermBB( v, c, n, p, a, m )}, ts )
}
].new;
}
sub MultBB( Array[TermBB] \ts --> TermBB) {
TermBB[ sub (\v, \c, \n, \p, \a, \m) {
m.( map {.unTermBB( v, c, n, p, a, m )}, ts )
}
].new;
}
The interesting generators are PowBB
, AddBB
and MultBB
because they are recursive. In PowBB
, the function passed as parameter to the TermBB
role constructor calls p
which has a signature of :(Any,Int --> Any)
, but actually requires an argument of the same type as the return value (we need a -> Int -> a
). The argument t
is of type TermBB
which is a wrapper around a function which, when applied, will return the right type. In the Raku implementation, this function is the method unTermBB
. So we need to call t.unTermBB( ... )
.
In AddBB
and MultBB
, we have an Array[TermBB]
so we need to call unTermBB
on every element, hence the map
call.
Using these generators we can write a single function to convert the algebraic data type into its BB encoding. Unsurprisingly, it is very similar to the pretty-printer and evaluator we wrote for Term
instances:
# Turn a Term into a BB Term
multi sub termToBB(Var \t ) { VarBB(t.var)}
multi sub termToBB(Par \c ) { ParBB( c.par)}
multi sub termToBB(Const \n) {ConstBB(n.const)}
multi sub termToBB(Pow \pw ) {
PowBB( termToBB(pw.term), pw.exp)
}
multi sub termToBB(Add \t ) {
AddBB( typed-map( TermBB, t.terms, &termToBB ))
}
multi sub termToBB(Mult \t ) {
MultBB( typed-map( TermBB, t.terms, &termToBB ))
}
# map &f and return in an Array of type T
sub typed-map (\T,\lst,&f) {
Array[T].new(map {f($_) }, |lst )
}
Because PowBB
, AddBB
and MultBB
require a TermBB
, we need to call termToBB
on the Term
fields. And because AddBB
and MultBB
take an array of Term
, we need a map
. However, Raku’s map
returns values of type Seq
, so we need an explicit conversion into Array
.
We can now convert any data structure of type Term
into its BB encoding:
my \qtermbb = termToBB( qterm);
say qtermbb.raku; # => TermBB[Sub].new
Interpreter 1: Pretty-printer with BB encoding
To create a pretty-printer for the BB-encoded type, we write implementations for each alternative, and the unTermBB
call magically combines these.
sub ppTermBB(TermBB \t --> Str){
sub var( \x ) { x }
sub par( \x ) { x }
sub const(\x ) { "{x}" }
sub pow( \t, \m ) { t ~ "^{m}" }
sub add( \ts ) { "("~join( " + ", ts)~")" }
sub mult( \ts ) { join( " * ", ts) }
t.unTermBB( &var, &par, &const, &pow, &add, &mult);
}
Compared with ppTerm
(copied below for convenience), the main differences are that there is no recursion and no need to map
anything. We also don’t need a multi sub
to pattern match on the constructors, and there is no need to unpack the values stored in the type using attribute accessors. As a result, the BB version is markedly less cluttered.
multi sub ppTerm(Var \t --> Str) { t.var }
multi sub ppTerm(Par \c --> Str) { c.par }
multi sub ppTerm(Const \n --> Str) { "{n.const}" }
multi sub ppTerm(Pow \pw --> Str){
ppTerm(pw.term) ~ '^' ~ "{pw.exp}"
}
multi sub ppTerm(Add \t --> Str) {
my @pts = map {ppTerm($_)}, |t.terms;
"("~join( " + ", @pts)~")"
}
multi sub ppTerm(Mult \t --> Str){
my @pts = map {ppTerm($_)}, |t.terms;
join( " * ", @pts)
}
Interpreter 2: Evaluator with BB encoding
And an evaluator is equally simple:
sub evalTermBB( %vars, %pars, \t) {
t.unTermBB(
-> \x { %vars{x} },
-> \x { %pars{x} },
-> \x {x},
-> \t,\m { t ** m},
-> \ts { [+] ts},
-> \ts { [*] ts}
);
}
As with evalTerm
below, we pass hashes for variable and parameter definitions as arguments to provide context for the evaluation. In the BB version we need to do this only once, rather than for every multi variant, so I have written it below using a given/when
. Even then, the BB version is a lot cleaner, for the same reasons as above.
sub evalTerm(%vars, %pars, Term \t) {
given t {
when Var { %vars{t.var} }
when Par { %pars{t.par} }
when Const { t.const }
when Pow { evalTerm(%vars, %pars,t.term) ** t.exp }
when Add {
my @pts = map {
evalTerm(%vars, %pars,$_)
}, |t.terms;
[+] @pts
}
when Mult {
my @pts = map {
evalTerm(%vars, %pars,$_)
}, |t.terms;
[*] @pts
}
}
}
Interpreter 3: Converting TermBB
to Term
Finally, let’s look at converting TermBB
to Term
. This is yet another type of interpreter so we can follow exactly the same approach as before:
sub toTerm(TermBB \t --> Term){
sub var( \x ) { Var[x].new }
sub par( \x ) { Par[x].new }
sub const( $x ) { Const[$x].new }
sub pow( \t, $m ) { Pow[ t, $m].new }
sub add( \ts ) { Add[ ts ].new }
sub mult( \ts ) { Mult[ ts ].new) }
t.unTermBB( &var, &par, &const, &pow, &add, &mult);
}
say toTerm(qtermbb).raku;
Using the BB type directly
In the examples above I have created the data structures using the Term
type and converted the result to a TermBB
type. We can of course also directly use the BB type. If we don’t use strict typing and make the argument of Add
and Mult
slurpy, we get a nice and clean representation:
# a*x^2 + b*x + x
my \qtermbb1 = AddBB(
MultBB(
ParBB( "a"),
PowBB( VarBB( "x"), 2)
),
MultBB(
ParBB( "b"),
VarBB( "x")
),
ParBB( "c")
);
# x^3 + 1
my \qtermbb2 = AddBB(
PowBB( VarBB( "x"), 3),
ConstBB(1)
);
# qterm1 * qterm2
my \qtermbb3 = MultBB(
qterm1, qterm2
);
This is structurally very similar to the examples using the Term
type. We can obtain exactly the same representation by using a slurpy helper function to wrap the role constructors for Term
. See the code in no-b-timing.raku` and ubb-timing.raku for details.
The code as presented above is not entirely correct: I have not always typed everything explicitly, but the explicit signatures in the role definition will cause type errors unless everything is explicitly typed. See the code in tbb-timing.raku for details.
The code in no-bb-timing
and ubb-timing
is comparable in terms of complexity. I ran a timing test, and the BB implementation of the algebraic data type is about 20% slower than the ‘ordinary’ implementation. However, the fully-typed version tbb-timing
is three times slower. Types in Raku are clearly not zero-cost abstractions.
On the other hand, somewhat paradoxically, we don’t really need this explicit typing. It is useful to write down the function types for the BB encoding, and I think it helps with the explanations, but the actual type safety comes from the algebraic data types that we created.
Conclusion
In this article and the previous one I have shown another way to implement algebraic data types in Raku. As with the approach discussed in ‘Roles as Algebraic Data Types in Raku’, I use a role to create the type. However, in this approach the entire data structure is encoded as a function using the Böhm-Berarducci encoding. From a type theoretical perspective, both approaches are precisely equivalent. In terms of coding effort and performance, both approaches are comparable.
The advantage of the BB approach is that because the data is encoded as a function, it becomes easier to create interpreters for the data type, and I have illustrated this with a pretty-printer and evaluator for a parsed expression. All interpreters for BB types have the same structure, which is why I call it a universal interpreter. The key feature is that these interpreters do not require any explicit recursion.
The complete code for both articles is in universal-interpreter.raku