# Algebraic Datatypes

## Overview

Haskell has a system called *algebraic datatypes* for defining new
types. This sounds fancy, but is rather simple. Let’s dive in by looking
at the standard library definitions of some familiar types:

```
data Bool = True | False
data Ordering = LT | EQ | GT
```

With this syntax you too can define types:

```
-- definition of a type with three values
data Color = Red | Green | Blue
-- a function that uses pattern matching on our new type
rgb :: Color -> [Double]
rgb Red = [1,0,0]
rgb Green = [0,1,0]
rgb Blue = [0,0,1]
```

```
Prelude> :t Red
Red :: Color
Prelude> :t [Red,Blue,Green]
[Red,Blue,Green] :: [Color]
Prelude> rgb Red
[1.0,0.0,0.0]
```

### Fields

Types like `Bool`

, `Ordering`

and `Color`

that just list a bunch of
constants are called *enumerations* or *enums* in Haskell and other
languages. Enums are useful, but you need other types as well. Here we
define a type for reports containing an id number, a title, and a body:

```
data Report = ConstructReport Int String String
```

This is how you create a report:

```
Prelude> :t ConstructReport 1 "Title" "This is the body."
ConstructReport 1 "Title" "This is the body." :: Report
```

You can access the fields with pattern matching:

```
reportContents :: Report -> String
reportContents (ConstructReport _ _ contents) = contents
setReportContents :: String -> Report -> Report
setReportContents contents (ConstructReport id title _) =
ConstructReport id title contents
```

### Constructors

The things on the right hand side of a `data`

declaration are called
*constructors*. `True`

, `False`

, `Red`

and `ConstructReport`

are all
examples of constructors. A type can have multiple constructors, and a
constructor can have zero or more fields.

Here is a datatype for a standard playing card. It has five
constructors, of which `Joker`

has zero fields and the others have one
field.

```
data Card = Joker
| Heart Int
| Club Int
| Spade Int
| Diamond Int
```

Constructors with fields have function type and can be used wherever functions can:

```
Prelude> :t Heart
Heart :: Int -> Card
Prelude> :t Club
Club :: Int -> Card
Prelude> map Heart [1,2,3]
[Heart 1,Heart 2,Heart 3]
Prelude> (Heart . (\x -> x+1)) 3
Heart 4
```

### Sidenote: Deriving

By the way, there’s something missing from our `Card`

type. Look at how
it behaves compared to `Ordering`

and `Bool`

:

```
Prelude> EQ
EQ
Prelude> True
True
Prelude> Joker
<interactive>:1:0:
No instance for (Show Card)
arising from a use of `print' at <interactive>:1:0-4
Possible fix: add an instance declaration for (Show Card)
In a stmt of a 'do' expression: print it
```

The problem is that Haskell does not know how to print the types we
defined. As the error says, they are not part of the `Show`

class. The
easy solution is to just add a `deriving Show`

after the type
definition:

```
data Card = Joker
| Heart Int
| Club Int
| Spade Int
| Diamond Int
deriving Show
```

```
Prelude> Joker
Joker
```

The `deriving`

syntax is a way to automatically make your class a member
of certain basic type classes, most notably `Read`

, `Show`

and `Eq`

.
We’ll talk more about what this means later.

### Algebraic?

So why are these datatypes called algebraic? This is because,
theoretically speaking, each datatype can be a *sum* of constructors,
and each constructor is a *product* of fields. It makes sense to think
of these as sums and products for many reasons, one being that we can
count the possible values of each type this way:

```
-- corresponds to 1+1. Has 2 possible values.
data Bool = True | False
-- corresponds to Bool*Bool, i.e. 2*2. Has 4 possible values
data TwoBools = TwoBools Bool Bool
-- corresponds to Bool*Bool+Bool+1 = 2*2+2+1 = 7
-- Has 7 possible values.
data Complex = Two Bool Bool | One Bool | None
```

There is a rich theory of algebraic datatypes. If you’re interested, you might find more info here or here.

## Type Parameters

We introduced type parameters and parametric polymorphism when
introducing lists in a previous reading. Since then, we’ve seen other
parameterized types like `Maybe`

and `Either`

. Now we’ll learn how we
can define our own parameterized types.

### Defining Parameterized Types

The definition for `Maybe`

is:

```
data Maybe a = Nothing | Just a
```

What’s `a`

? We define a parameterized type by mentioning a *type
variable* (`a`

in this case) on the left side of the `=`

sign. We can
then use the same type variable in fields for our constructors. This is
analogous to polymorphic functions. Instead of defining separate
functions

```
headInt :: [Int] -> Int
headBool :: [Bool] -> Bool
```

and so on, we define one function `head :: [a] -> a`

that works for all
types `a`

. Similarly, instead of defining multiple types

```
data MaybeInt = NothingInt | JustInt Int
data MaybeBool = NothingBool | JustBool Bool
```

we define one type `Maybe a`

that works for all types `a`

.

Here’s our first own parameterized type `Described`

. The values of type
`Described a`

contain a value of type `a`

and a `String`

description.

```
data Described a = Describe a String
getValue :: Described a -> a
getValue (Describe x _) = x
getDescription :: Described a -> String
getDescription (Describe _ desc) = desc
```

```
Prelude> :t Describe
Describe :: a -> String -> Described a
Prelude> :t Describe True "This is true"
Describe True "This is true" :: Described Bool
Prelude> getValue (Describe 3 "a number")
3
Prelude> getDescription (Describe 3 "a number")
"a number"
```

### Syntactic Note

In the above definitions, we’ve used `a`

as a type variable. However any
word that starts with a lower case letter is fine. We could have defined
`Maybe`

like this:

```
data Maybe theType = Nothing | Just theType
```

The rules for Haskell identifiers are:

- Type variables and names for functions and values start lower case
(e.g.
`a`

,`map`

,`xs`

) - Type names and constructor names start with upper case (e.g.
`Maybe`

,`Just`

,`Card`

,`Heart`

)

Note that a type and its constructor can have the same name. This is very common in Haskell code for types that only have one constructor. In this material we try to avoid it to avoid confusion. Here are some examples:

```
data Pair a = Pair a a
data Report = Report Int String String
```

```
Prelude> :t Pair
Pair :: a -> a -> Pair a
```

Beware of mixing up types and constructors. Luckily types and constructors can never occur in the same context, so you get a nice error:

```
Prelude> Maybe
<interactive>:1:1: error:
• Data constructor not in scope: Maybe
Prelude> undefined :: Nothing
<interactive>:2:14: error:
Not in scope: type constructor or class ‘Nothing’
```

### Sidenote: Multiple Type Parameters

Types can have multiple type parameters. The syntax is similar to
defining functions with many arguments. Here’s the definition of the
standard `Either`

type:

```
data Either a b = Left a | Right b
```

## Recursive Types

So far, all of the types we’ve defined have been of constant size. We can represent one report or one colour, but how could we represent a collection of things? We could use lists of course, but could we define a list type ourselves?

Just like Haskell functions, Haskell data types can be *recursive*. This
is no weirder than having an object in Java or Python that refers to
another object of the same class. This is how you define a list of
integers:

```
data IntList = Empty | Node Int IntList
deriving Show
ihead :: IntList -> Int
ihead (Node i _) = i
itail :: IntList -> IntList
itail (Node _ t) = t
ilength :: IntList -> Int
ilength Empty = 0
ilength (Node _ t) = 1 + ilength t
```

We can use the functions defined above to work with lists of integers:

```
Prelude> ihead (Node 3 (Node 5 (Node 4 Empty)))
3
Prelude> itail (Node 3 (Node 5 (Node 4 Empty)))
Node 5 (Node 4 Empty)
Prelude> ilength (Node 3 (Node 5 (Node 4 Empty)))
3
```

Note that we can’t put values other than `Int`

s inside our `IntList`

:

```
Prelude> Node False Empty
<interactive>:3:6: error:
• Couldn't match expected type ‘Int’ with actual
type ‘Bool’
• In the first argument of ‘Node’, namely ‘False’
In the expression: Node False Empty
In an equation for ‘it’: it = Node False Empty
```

To be able to put any type of element in our list, let’s do the same
thing with a type parameter. This is the same as the built in type
`[a]`

, but with slightly clunkier syntax:

```
data List a = Empty | Node a (List a)
deriving Show
```

Note how we need to pass the the type parameter `a`

onwards in the
recursion. We need to write `Node a (List a)`

instead of `Node a List`

.
The `Node`

constructor has two arguments. The first has type `a`

, and
the second has type `List a`

. Here are the reimplementations of some
standard list functions for our `List`

type:

```
lhead :: List a -> a
lhead (Node h _) = h
ltail :: List a -> List a
ltail (Node _ t) = t
lnull :: List a -> Bool
lnull Empty = True
lnull _ = False
llength :: List a -> Int
llength Empty = 0
llength (Node _ t) = 1 + llength t
```

```
Prelude> lhead (Node True Empty)
True
Prelude> ltail (Node True (Node False Empty))
Node False Empty
Prelude> lnull Empty
True
```

Note that just like with normal Haskell lists, we can’t have elements of different types in the same list:

```
Prelude> Node True (Node "foo" Empty)
<interactive>:5:12: error:
• Couldn't match type ‘[Char]’ with ‘Bool’
Expected type: List Bool
Actual type: List [Char]
• In the second argument of ‘Node’, namely
‘(Node "foo" Empty)’
In the expression: Node True (Node "foo" Empty)
In an equation for ‘it’:
it = Node True (Node "foo" Empty)
```

### Example: Growing a Tree

Just like a list, we can also represent a binary tree:

```
data Tree a = Node a (Tree a) (Tree a) | Empty
```

Our tree contains nodes, which contain a value of type `a`

and two child
trees, and empty trees.

In case you’re not familiar with binary trees, they’re a data structure
that’s often used as the basis for other data structures (`Data.Map`

is
based on trees!). Binary trees are often drawn as (upside-down)
pictures, like this:

The highest node in the tree is called the *root* (`0`

in this case),
and the nodes with no children are called `leaves`

(`2`

, `3`

and `4`

in
this case). We can define this tree using our `Tree`

type like this:

```
example :: Tree Int
example = (Node 0 (Node 1 (Node 2 Empty Empty)
(Node 3 Empty Empty))
(Node 4 Empty Empty))
```

The height of a binary tree is length of the longest path from the root
to a leaf. In Haskell terms, it’s how many nested levels of `Node`

constructors you need to build the tree. The height of our example tree
is 3. Here’s a function that computes the height of a tree:

```
treeHeight :: Tree a -> Int
treeHeight Empty = 0
treeHeight (Node _ l r) =
1 + max (treeHeight l) (treeHeight r)
```

```
treeHeight Empty ==> 0
treeHeight (Node 2 Empty Empty)
==> 1 + max (treeHeight Empty)
(treeHeight Empty)
==> 1 + max 0 0
==> 1
treeHeight (Node 1 Empty (Node 2 Empty Empty))
==> 1 + max (treeHeight Empty)
(treeHeight (Node 2 Empty Empty))
==> 1 + max 0 1
==> 2
treeHeight (Node 0 (Node 1 Empty
(Node 2 Empty Empty))
Empty)
==> 1 + max (treeHeight (Node 1 Empty
(Node 2 Empty Empty)))
(treeHeight Empty)
==> 1 + max 2 0
==> 3
```

In case you’re familiar with *binary search trees*, here are the
definitions of the lookup and insert opertions for a binary search tree.
If you don’t know what I’m talking about, you don’t need to understand
this.

```
lookup :: Int -> Tree Int -> Bool
lookup x Empty = False
lookup x (Node y l r)
| x < y = lookup x l
| x > y = lookup x r
| otherwise = True
insert :: Int -> Tree Int -> Tree Int
insert x Empty = Node x Empty Empty
insert x (Node y l r)
| x < y = Node y (insert x l) r
| x > y = Node y l (insert x r)
| otherwise = Node y l r
```

## Record Syntax

If some fields need to be accessed often, it can be convenient to have
helper functions for reading those fields. For instance, the type
`Person`

might have multiple fields:

```
data Person = MkPerson String Int String String String
deriving Show
```

A list of persons might look like the following:

```
people :: [Person]
people = [ MkPerson "Jane Doe" 21 "Houston" "Texas" "Engineer"
, MkPerson "Maija Meikäläinen" 35 "Rovaniemi" "Finland" "Engineer"
, MkPerson "Mauno Mutikainen" 27 "Turku" "Finland" "Mathematician" ]
```

Suppose that we need to find all engineers from Finland:

```
query :: [Person] -> [Person]
query [] = []
query (MkPerson name age town state profession):xs
| state == "Finland" && profession == "Engineer" =
(MkPerson name age town state profession) : query xs
| otherwise = query xs
```

Thus,

```
query people
==> [MkPerson "Maija Meikäläinen" 35 "Rovaniemi" "Finland" "Engineer"]
```

Note that the types of the fields give little information on what is the
intended content in those fields. We need to remember in all places in
the code that `town`

goes before `state`

and not vice versa.

Haskell has a feature called *record syntax* that is helpful in these
kinds of cases. The datatype `Person`

can be defined as a record:

```
data Person =
MkPerson { name :: String
, age :: Int
, town :: String
, state :: String
, profession :: String }
deriving Show
```

We can still define values of `Person`

normally, but the `Show`

instance
prints the field names for us:

```
Prelude> MkPerson "Jane Doe" 21 "Houston" "Texas" "Engineer"
MkPerson {name = "Jane Doe", age = 21, town = "Houston", state = "Texas", profession = "Engineer"}
```

However, we can also define values using record syntax. Note how the fields don’t need to be in any specific order now that they have names.

```
Prelude> MkPerson {name = "Jane Doe", town = "Houston", profession = "Engineer", state = "Texas", age = 21}
MkPerson {name = "Jane Doe", age = 21, town = "Houston", state = "Texas", profession = "Engineer"}
```

Most importantly, We get *accessor functions* for the fields for free:

```
Prelude> :t profession
profession :: Person -> String
Prelude> profession (MkPerson "Jane Doe" 21 "Houston" "Texas" "Engineer")
"Engineer"
```

We can now rewrite the query function using these accessor functions:

```
query :: [Person] -> [Person]
query [] = []
query (x:xs)
| state x == "Finland" && profession x == "Engineer" =
x : query xs
| otherwise = query xs
```

You’ll probably agree that the code looks more pleasant now.

## Algebraic Datatypes: Summary

- Types are defined like this

```
data TypeName = ConstructorName FieldType FieldType2
| AnotherConstructor FieldType3
| OneMoreCons
```

- … or like this if we’re using type variables

```
data TypeName variable = Cons1 variable Type1
| Cons2 Type2 variable
```

- You can have one or more constructors
- Each constructor can have zero or more fields
- Constructors start with upper case, type variables with lower case
- Values are handled with pattern matching:

```
foo (ConstructorName a b) = a+b
foo (AnotherConstructor _) = 0
foo OneMoreCons = 7
```

- Constructors are just functions:

```
ConstructorName :: FieldType -> FieldType2 -> TypeName
Cons1 :: a -> Type1 -> TypeName a
```

- You can also define datatypes using record syntax:

```
data TypeName = Constructor { field1 :: Field1Type
, field2 :: Field2Type }
```

This gives you accessor functions like
`field1 :: TypeName -> Field1Type`

for free.

## Sidenote: Other Ways of Defining Types

In addition to the `data`

keyword, there are two additional ways of
defining types in Haskell.

The `newtype`

keyword works like `data`

, but you can only have a single
constructor with a single field. It’s sometimes wise to use `newtype`

for performance reasons.

The `type`

keyword introduces a *type alias*. Type aliases don’t affect
type checking, they just offer a shorthand for writing types. For
example the familiar `String`

type is an alias for `[Char]`

:

```
type String = [Char]
```

This means that whenever the compiler reads `String`

, it just
immediately replaces it with `[Char]`

. Type aliases seem useful, but
they can easily make reading type errors harder.

## How Do Algebraic Datatypes Work?

Remember how lists were represented in memory as linked lists? Let’s look in more detail at what algebraic datatypes look like in memory.

Haskell data forms *directed graphs* in memory. Every constructor is a
node, every field is an edge. Names (of variables) are pointers into
this graph. Different names can *share* parts of the structure. Here’s
an example with lists. Note how the last two elements of `x`

are shared
with `y`

and `z`

.

```
let x = [1,2,3,4]
y = drop 2 x
z = 5:y
```

What happens when you make a new version of a datastructure is called
*path copying*. Since Haskell data is immutable, the changed parts of
the datastructure get copied, while the unchanged parts can be shared
between the old and new versions.

Consider the definition of `++`

:

```
[] ++ ys = ys
(x:xs) ++ ys = x:(xs ++ ys)
```

We are making a copy of the first argument while we walk it. For every
`:`

constructor in the first input list, we are creating a new `:`

constructor in the output list. The second argument can be shared. It is
not used at all in the recursion. Visually:

One more way to think about it is this: we want to change the `tail`

pointer of the list element `(3:)`

. That means we need to make a new
`(3:)`

. However the `(2:)`

points to the `(3:)`

so we need a new copy of
the `(2:)`

as well. Likewise for `(1:)`

.

The graphs that we get when working with lists are fairly simple. As a more involved example, here is what happens in memory when we run the binary tree insertion example from earlier in this lecture.

```
insert :: Int -> Tree Int -> Tree Int
insert x Empty = Node x Empty Empty
insert x (Node y l r)
| x < y = Node y (insert x l) r
| x > y = Node y l (insert x r)
| otherwise = Node y l r
```

Note how the old and the new tree share the subtree with 3 and 4 since it wasn’t changed, but the node 7 that was “changed” and all nodes above it get copied.

## Self Checks

### Check 1

Why can’t we `map Nothing`

?

- Because
`Nothing`

doesn’t take arguments - Because
`Nothing`

returns nothing - Because
`Nothing`

is a constructor.

### Check 2

If we define `data Boing = Frick String Boing (Int -> Bool)`

, what is
the type of `Frick`

?

`Boing`

`String -> Boing -> Int -> Bool -> Boing`

`String -> Boing -> (Int -> Bool) -> Boing`

### Check 3

If we define `data ThreeLists a b c = ThreeLists [a] [b] [c]`

, what is
the type of the constructor `ThreeLists`

?

`[a] -> [b] -> [c] -> ThreeLists`

`a -> b -> c -> ThreeLists a b c`

`[a] -> [b] -> [c] -> ThreeLists a b c`

`[a] -> [b] -> [c] -> ThreeLists [a] [b] [c]`

### Check 4

If we define:

```
data TwoLists a b = TwoList { aList :: [a]
, bList :: [b] }
```

what is the type of the function `aList`

?

`aList`

is not a function, it is a field`TwoLists a b -> [a]`

`[a] -> TwoLists a b`

`[a]`

#### Acknowledgment:

This reading was originally written by **Joel Kaasinen** and
**John Lång** in their
open source textbook.
The book is licensed under the
Creative Commons Attribution-ShareAlike 4.0 International License,
which allows users to copy, modify, and distribute the book.

This reading was modified by Titus H. Klinge in 2021 and presented above under the same license in order to better serve the students of this course.