This document gives an introduction to the basics of the functional programming language Haskell. The main points we cover are using the Haskell interpreter, expressions, operators and functions in Haskell, as well as notational constructs for guards and local definitions
Just like Java, Haskell is both a compiled and an interpreted language. Haskell programs, consisting of function definitions are written in source files, which are compiled (a difference between Haskell and Java is that compilation is done by the interpreter). Expressions can then be typed in to the Haskell interpreter; evaluation of these expressions may involve using compiled code.
On the departmental network, the Haskell interpreter is invoked by typing
hugs at the Unix prompt (if this doesn't work, check that
/usr/bin is in your PATH variable).
You should then see something like the following:
Typing :quit or :q at the prompt will exit
the hugs interpreter.
(This is an example of a command
to the interpreter; apart from obeying commands, the other thing that
the interpreter does is evaluate expressions)
Expressions in Haskell are built from constants, operators and function application (and brackets).
2, -3.45,
True,
'a', "hello" and [1,3,5].
Note that all expressions in Haskell are typed: the types of the above
examples are, respectively, Integer, Float,
Bool,
Char, String, and List-of-Integers (which is
written as [Integer] in Haskell).. |
function composition |
*, /, ^ |
multiplication, division, exponentiation |
+, - |
addition, subtraction |
:, ++ |
add to list, concatenate lists |
==, /=, <=, etc |
equality, inequality, comparison operators |
&& |
logical and |
|| |
logical or |
even
to a number such as 39, we write even 39
(which will evaluate to False).2 + 4 * 6 evaluates to
26, whereas (2+4) * 6 evaluates to
36. Note that function application binds tighter than
all other operators, so that twice 2 + 1 evaluates to
5 (and twice -2 will produce a syntax error,
because the interpreter will parse it as (twice -) 2).
Haskell is a strongly typed language, which means that all expressions must
be built up from operands of the appropriate type.
Among the built-in basic types are:
Integer |
unbounded integer numbers |
Int |
32-bit integer numbers |
Rational |
unbounded rational numbers |
Float, Double |
single- and double-precision floating-point numbers |
Bool |
Boolean values (True, False) |
Char |
characters, e.g., 'a' |
Whereas Int has a fixed capacity for storing integers
(32 bits, so the largest number of this type is 2147483647),
there is in principle no bound on the size of numbers of type
Integer (the actual bound in practice is determined
by the memory capacity of the machine on which the interpreter
is running). The Haskell interpreter will, for example, quickly
evaluate expressions like 2^400 + 2^500.
Haskell also has some built-in type operators that build compound types. For example, the type of pairs of numbers is
(Integer, Integer)
Values of this type are also written with brackets and commas;
for example, (2,-4) and (2^8, 8/2) are both
expressions of type
(Integer, Integer)
In general, for any types a and b, the type
(a,b) is the type of pairs (x, y),
where x is of type a and
y is of type b.
For example,
(Char, (Int,Float))
is the type of pairs whose first element is of type Char
and whose second element is a pair of type (Int, Float).
In fact, this notation for compound types extends to tuples containing any number of elements (but note that each tuple type fixes the number, and type, of the elements). For example,
(Integer, Char, Char, Float, Bool)
is the type of "5-tuples" whose first element is an Integer,
whose second and third elements are both of type Char,
whose fourth element is of type Float, and whose fifth
element is of type Bool.
Such tuple types are therefore comparable to record types in imperative
languages.
The other main kind of compound type in Haskell (and in most typed
functional languages) are the list types.
For example, the type of lists of integer numbers is denoted
[Integer], and elements of this type can be written
as sequences like [1, 2, 1+2].
We look at lists in more detail later.
We will also see later that functions are first-class citizens, and can form parts of expressions: for example, what we call higher-order functions are functions that take functions as arguments. In Haskell,
a -> b
is the type of functions that take an argument of type a
and return a result of type b (here, a and
b are supposed to be any Haskell types).
For example, the built-in function even has type
Integer -> Bool
Writing programs in a functional programming language means defining functions.
The interpreter then uses these function definitions in evaluating expressions.
For example, suppose we create a file called doubling.hs containing
the following:
-- function to double a number twice x = 2 * x |
(comments in Haskell are preceded by --, and everything to the end of
the line is ignored by the interpreter).
We can now type :l doubling at the Hugs prompt, which causes the
interpreter to read in the file (see commands below).
The interpreter can now evaluate expressions like twice 7.
A snapshot showing this is given below.
A function definition generally has the following form:
<FunctionDef> ::= <Name> <Variable> = <Body>
where the <Body> is an expression in which the given
variable may (and probably does) occur.
We refer to the <Variable> as the formal parameter,
or as the bound variable;
for example, the bound variable in the definition of twice
is x.
Note that the interpreter will infer the type of the function
(by looking at the type of the expression in the <Body>), but we
could specify the type in order to document our code (and to get error
messages from the interpreter if we make a mistake in inferring
the type of the function):
-- function to double a number twice :: Integer -> Integer twice x = 2 * x |
A file can contain many function definitions:
-- function to double a number twice :: Integer -> Integer twice x = 2 * x -- function to divide a number by two halve :: Float -> Float halve x = x / 2 |
Sometimes the body of the function definition may be so long that it becomes desirable to break the expression over two or more lines. That is permitted, but must conform to the offside rule: anything belonging to a function definition must be indented from where that function definition began; anything that is not indented is taken to be a new function definition. Thus we can write:
twice x = 2 * x |
twice x = 2 * x |
The hugs interpreter evaluates expressions such as
12 <= 2*5 && 3 + 5 > 2^3 - 1
using standard arithmetic and boolean routines, which won't concern us here.
Rather, we will look at how expressions that contain function applications are
evaluated.
Given the definition
twice x = 2 * x |
the expression twice 3 is evaluated as follows:
3, is substituted for the
formal parameter, x, in the body of
the definition, giving 2 * 36.We can summarise this process as follows:
twice 3 => 2 * 3
=> 6
|
where the arrow (=>) can be read as "evaluates to".
As a more complex example, suppose we have the definitions
twice x = 2 * x square x = x * x |
The expression square (twice 3) is evaluated as follows:
square,
the actual parameter, twice 3, is substituted for
the formal parameter, x, giving the expression
(twice 3) * (twice 3)
twice 3
is evaluated to 6 as in the previous example,
giving the expression
6 * twice 3
twice 3 is evaluated as above,
giving the expression
6 * 6
36.The step-by-step summary of this process is:
square (twice 3) => (twice 3) * (twice 3)
=> 2 * 3 * (twice 3)
=> 6 * (twice 3)
=> 6 * (2 * 3)
=> 36
|
One exception to the form of function definition defined above involves pattern-matching. Suppose for example that we wanted to write a function that takes a pair of numbers and adds them. We can write
-- function to add a pair of numbers add :: (Integer,Integer) -> Integer add (x,y) = x + y |
The Haskell interpreter will read this in and evaluate expressions
such as add (2,6), which in this case would give the
result 8.
Instead of a simple bound variable in the function definition, we have
a pattern: in this case (x,y).
In general, a pattern is made up of special operators (such as
(_,_) in the example above) and bound variables.
When the function is applied to an expression that matches the
pattern, the effect of the matching is to substitute actual parameters
for the formal parameters in the body of the function.
In the case of add (2,6), the expression (2,6)
matches the pattern (x,y) by matching x with
2 and y with 6.
These values are substituted for x and y
in the body of the function, giving 2+6, which evaluates
to 8.
For future reference, there are two useful pre-defined functions
on tuple types of the form (A,B) that give the first
and second components of such pairs, fst (for "first")
and snd (for "second"), which are defined as follows:
fst (x,y) = x snd (x,y) = y |
(as an aside, the interpreter will complain if one of your .hs
files redefines a predefined finction such as fst.)
Besides functions that take elements of tuple types as arguments,
another way of defining functions that take more than one argument is
currying
(named after the logician Haskell Curry, who is also
honoured in the name of the programming language).
The curried definition of add would look like this:
-- function to add a pair of numbers curriedAdd :: Integer -> Integer -> Integer curriedAdd x y = x + y |
This definition allows the interpreter to evaluate expressions
such as curriedAdd 3 5 (which would evaluate to 8).
The spaces in this expression represent the operation of function application,
which is a left-associative operator, which means that the expression
is to be parsed as
(curriedAdd 3) 5
But what about the subexpression curriedAdd 3?
This is in fact a function, which is applied to the "second argument",
5!
In other words, curriedAdd is a function that takes
an Integer as argument and returns a function as a result;
this resulting function takes an Integer as argument
(the "second argument") and returns an Integer (the final
result).
For example, curriedAdd 3 is a function that takes a number
as argument and returns the result of adding 3 to that number.
The type of curriedAdd is therefore
Integer -> (Integer -> Integer)
(The "type operator" -> is right-associative,
which is why the brackets are omitted from the type declaration
preceding the definition of curriedAdd.)
We note in passing that recursive definitions often make use of
pattern-matching.
For example, recall that the factorial of a number n
is the product of all the numbers up to n, i.e.,
1 * 2 * 3 * ... * (n - 1) * n
How can we write a function fact that takes a number
n and returns its factorial.
Obviously, we can compute fact n if we can compute
the smaller product
1 * 2 * 3 * ... * (n - 1)
because multiplying this number by n gives us the number
we desire.
But this smaller product is the factorial of n-1,
i.e., it is fact (n-1), which tells us that
fact n = fact(n-1) * n
if n>1.
This gives (almost) an effective definition of the factorial function,
since, for example,
fact 4 => (fact 3) * 4
=> ((fact 2) * 3) * 4
=> (((fact 1) * 2) * 3) * 4
=> ((((fact 0) * 1) * 2) * 3) * 4
=> ??
|
Here we have to make a decision about fact 0:
should it have any value at all?
Clearly, the factorial of 1 is 1,
but how are we to interpret
1 * ... * 0
In fact we choose to define
fact 0 = 1(there is a reason for this choice). Now we can make the following Haskell definition:
-- compute the factorial of a number fact :: Integer -> Integer fact 0 = 1 fact n = fact (n-1) * n |
Here we are defining the function using pattern-matching:
there is no formal parameter in the first equation, just the pattern
0, which will only match with the actual parameter
0 itself.
When a function is defined by more than one definition, as fact
is, then in order to evaluate a function application, the interpreter
will go through each definition in order and check whether
the actual parameter matches with the pattern in the definition;
when it finds a definition with a pattern that matches, that definition
will be applied.
This means that when fact is applied to 0,
the first equation will be applied; for all other arguments,
the second equation will be applied.
Thus the complete evaluation of fact 4 is:
fact 4 => (fact 3) * 4
=> ((fact 2) * 3) * 4
=> (((fact 1) * 2) * 3) * 4
=> ((((fact 0) * 1) * 2) * 3) * 4
=> (((1 * 1) * 2) * 3) * 4)
=> 24
|
Similarly, the following definition of the "Fibonnacci function" uses pattern matching in the first two equations:
-- compute the nth value in the Fibonnacci sequence -- (counting from 0) fib :: Integer -> Integer fib 0 = 1 fib 1 = 1 fib n = fib (n-2) + fib (n-1) |
Suppose we want to define a function
maxOf2 :: Integer -> Integer -> Integer
that takes two numbers as arguments and returns the larger of the two
numbers as result.
We would want to write something like
maxOf2 x y = x if x >= y
maxOf2 x y = y if y >= x
Haskell allows such definitions, with the following syntax:
maxOf2 x y | x >= y = x | y >= x = y |
The general form of such definitions is
<FunctionName> <Parameters>
| <Test1> = <Exp1>
| <Test2> = <Exp2>
... ...
| <Testn> = <Expn>
| otherwise = <Exp>
where each of the <Test>s is a boolean expression
(in which the parameters may be used), and each of the
<Exp> is an expression representing the result
if the corresponding <Test> evaluates to
True.
The clause with the keyword otherwise is optional
(it specifies the result when none of the <Test>s
evaluates to True).
In the definition of maxOf2 above, note that if the
two actual parameters are the same, then both the tests evaluate
t True;
in this case, the expression associated with the first clause is
returned.
Other possible definitions include:
maxOf2 x y | x >= y = x | otherwise = y |
maxOf2 x y | x >= y = x | x < y = y |
Note that, in all of these definitions, the indentation (before the |)
is necessary by the offside rule.
Haskell also supports local definitions within function definitions. These can be used to introduce aliases for expressions to enhance clarity, e.g.,
sumSquares :: Integer -> Integer -> Integer sumSquares x y = sqX + sqY where sqX = x * x sqY = y * y |
The definitions following the keyword where are local:
if they are used in an expression typed at the interpreter's prompt,
the interpreter will report an error in the use of undefined
variables.
In this example, the local definitions consisted of local constants
sqX and sqY.
It is also possible to use patterns in local definitions,
for example, to match with pairs:
-- compute the nth value in the Fibonnacci sequence -- (counting from 0) fib :: Integer -> Integer fib n = f1 where (f1,f2) = fibs n -- function to compute the nth and (n+1)th values -- in the Fibonnacci sequence fibs :: Integer -> (Integer,Integer) fibs 0 = (1,1) fibs n = (f2, f1+f2) where (f1,f2) = fibs (n-1) |
Finally, functions can be defined locally:
-- compute the nth value in the Fibonnacci sequence
-- (counting from 0)
fib :: Integer -> Integer
fib n = f1
where
(f1,f2) = fibs n
fibs 0 = (1,1)
fibs n = (f2, f1+f2)
where
(f1,f2) = fibs (n-1)
|
When the file is loaded into the interpreter, the user can use
the function fib in expressions, but not the function
fibs, which is effectively hidden in the local
definition.
Note that the above example also illustrates nested local definitions,
as well as scope: in the local definition of fibe,
the names f1 and f2 are redefined.
A list of the commands recognised by the hugs interpreter can be obtained by
typing :? at the hugs prompt.
Among the more useful are:
:quit Quit the interpreter:load <filename> read in the file called
<filename>;
files containing Haskell definitions should have a .hs
extension, which can be omitted from the <filename>.:reload repeat the last load command:cd <directory> change the current working directory
to <directory>:!<shell command> execute the shell command;
for example :!ls will print the contents of the current
working directory:type <expr> give the type of the expression
<expr>
Most of these commands can be abbreviated to just the colon and first letter;
for example, :q will quit the interpreter, and
:t True will give the type of the constant
True (which is Bool).