Introduction to Haskell


This document gives an introduction to the basics of the functional programming language Haskell. The main points we cover are using the Haskell interpreter, expressions, operators and functions in Haskell, as well as notational constructs for guards and local definitions

Contents



The Haskell Interpreter

Just like Java, Haskell is both a compiled and an interpreted language. Haskell programs, consisting of function definitions are written in source files, which are compiled (a difference between Haskell and Java is that compilation is done by the interpreter). Expressions can then be typed in to the Haskell interpreter; evaluation of these expressions may involve using compiled code.

On the departmental network, the Haskell interpreter is invoked by typing hugs at the Unix prompt (if this doesn't work, check that /usr/bin is in your PATH variable). You should then see something like the following:


Snapshot of a terminal window running hugs


Typing :quit or :q at the prompt will exit the hugs interpreter. (This is an example of a command to the interpreter; apart from obeying commands, the other thing that the interpreter does is evaluate expressions)



Expressions

Expressions in Haskell are built from constants, operators and function application (and brackets).

Constants
are literal denotations, for example 2, -3.45, True, 'a', "hello" and [1,3,5]. Note that all expressions in Haskell are typed: the types of the above examples are, respectively, Integer, Float, Bool, Char, String, and List-of-Integers (which is written as [Integer] in Haskell).

 
Operators
include the standard arithmetic and boolean operators, as well as operators that work on lists. The following table lists some of the more useful Haskell operations, in decreasing binding power from top to bottom:
 
. function composition
*, /, ^ multiplication, division, exponentiation
+, - addition, subtraction
:, ++ add to list, concatenate lists
==, /=, <=, etc equality, inequality, comparison operators
&& logical and
|| logical or

 
Function application
is written in Prefix Notation. Thus, to apply a function even to a number such as 39, we write even 39 (which will evaluate to False).

 
Brackets
are used to disambiguate expressions and to force evaluation of subexpressions contained in brackets before evaluation of the whole expression. For example, 2 + 4 * 6 evaluates to 26, whereas (2+4) * 6 evaluates to 36. Note that function application binds tighter than all other operators, so that twice 2 + 1 evaluates to 5 (and twice -2 will produce a syntax error, because the interpreter will parse it as (twice -) 2).
A snapshot showing hugs evaluating various expressions is given below:


Snapshot of hugs evaluating expressions



Types

Haskell is a strongly typed language, which means that all expressions must be built up from operands of the appropriate type. Among the built-in basic types are:
 
Integer unbounded integer numbers
Int 32-bit integer numbers
Rational unbounded rational numbers
Float, Double single- and double-precision floating-point numbers
Bool Boolean values (True, False)
Char characters, e.g., 'a'

 

Whereas Int has a fixed capacity for storing integers (32 bits, so the largest number of this type is 2147483647), there is in principle no bound on the size of numbers of type Integer (the actual bound in practice is determined by the memory capacity of the machine on which the interpreter is running). The Haskell interpreter will, for example, quickly evaluate expressions like 2^400 + 2^500.

Haskell also has some built-in type operators that build compound types. For example, the type of pairs of numbers is

    (Integer, Integer)
Values of this type are also written with brackets and commas; for example, (2,-4) and (2^8, 8/2) are both expressions of type
    (Integer, Integer)
In general, for any types a and b, the type (a,b) is the type of pairs (x, y), where x is of type a and y is of type b. For example,
    (Char, (Int,Float))
is the type of pairs whose first element is of type Char and whose second element is a pair of type (Int, Float).

In fact, this notation for compound types extends to tuples containing any number of elements (but note that each tuple type fixes the number, and type, of the elements). For example,

    (Integer, Char, Char, Float, Bool)
is the type of "5-tuples" whose first element is an Integer, whose second and third elements are both of type Char, whose fourth element is of type Float, and whose fifth element is of type Bool. Such tuple types are therefore comparable to record types in imperative languages.

The other main kind of compound type in Haskell (and in most typed functional languages) are the list types. For example, the type of lists of integer numbers is denoted [Integer], and elements of this type can be written as sequences like [1, 2, 1+2]. We look at lists in more detail later.

We will also see later that functions are first-class citizens, and can form parts of expressions: for example, what we call higher-order functions are functions that take functions as arguments. In Haskell,

    a -> b
is the type of functions that take an argument of type a and return a result of type b (here, a and b are supposed to be any Haskell types). For example, the built-in function even has type
    Integer -> Bool

Function Definitions

Writing programs in a functional programming language means defining functions. The interpreter then uses these function definitions in evaluating expressions. For example, suppose we create a file called doubling.hs containing the following:

-- function to double a number

twice x  =  2 * x

(comments in Haskell are preceded by --, and everything to the end of the line is ignored by the interpreter). We can now type :l doubling at the Hugs prompt, which causes the interpreter to read in the file (see commands below). The interpreter can now evaluate expressions like twice 7. A snapshot showing this is given below.


Snapshot of hugs evaluating expressions


A function definition generally has the following form:

    <FunctionDef>  ::=  <Name> <Variable> = <Body>
where the <Body> is an expression in which the given variable may (and probably does) occur. We refer to the <Variable> as the formal parameter, or as the bound variable; for example, the bound variable in the definition of twice is x. Note that the interpreter will infer the type of the function (by looking at the type of the expression in the <Body>), but we could specify the type in order to document our code (and to get error messages from the interpreter if we make a mistake in inferring the type of the function):

-- function to double a number

twice :: Integer -> Integer

twice x  =  2 * x

A file can contain many function definitions:

-- function to double a number

twice :: Integer -> Integer

twice x  =  2 * x



-- function to divide a number by two

halve :: Float -> Float

halve x  =  x / 2

Sometimes the body of the function definition may be so long that it becomes desirable to break the expression over two or more lines. That is permitted, but must conform to the offside rule: anything belonging to a function definition must be indented from where that function definition began; anything that is not indented is taken to be a new function definition. Thus we can write:

twice x  =
 2 * x

but not

twice x  =  
2 * x


Evaluation

The hugs interpreter evaluates expressions such as

    12 <= 2*5 && 3 + 5 > 2^3 - 1
using standard arithmetic and boolean routines, which won't concern us here. Rather, we will look at how expressions that contain function applications are evaluated. Given the definition

twice x  =  2 * x

the expression twice 3 is evaluated as follows:

We can summarise this process as follows:

twice 3  =>  2 * 3
         =>  6

where the arrow (=>) can be read as "evaluates to".

As a more complex example, suppose we have the definitions

twice x  =  2 * x

square x =  x * x

The expression square (twice 3) is evaluated as follows:

The step-by-step summary of this process is:

square (twice 3)  =>  (twice 3) * (twice 3)
                  =>  2 * 3 * (twice 3)
                  =>  6 * (twice 3)
                  =>  6 * (2 * 3)
                  =>  36


Pattern-Matching and Currying

One exception to the form of function definition defined above involves pattern-matching. Suppose for example that we wanted to write a function that takes a pair of numbers and adds them. We can write

-- function to add a pair of numbers
add :: (Integer,Integer) -> Integer

add (x,y)  =  x + y

The Haskell interpreter will read this in and evaluate expressions such as add (2,6), which in this case would give the result 8.

Instead of a simple bound variable in the function definition, we have a pattern: in this case (x,y). In general, a pattern is made up of special operators (such as (_,_) in the example above) and bound variables. When the function is applied to an expression that matches the pattern, the effect of the matching is to substitute actual parameters for the formal parameters in the body of the function. In the case of add (2,6), the expression (2,6) matches the pattern (x,y) by matching x with 2 and y with 6. These values are substituted for x and y in the body of the function, giving 2+6, which evaluates to 8.

For future reference, there are two useful pre-defined functions on tuple types of the form (A,B) that give the first and second components of such pairs, fst (for "first") and snd (for "second"), which are defined as follows:

fst (x,y)  =  x
snd (x,y)  =  y

(as an aside, the interpreter will complain if one of your .hs files redefines a predefined finction such as fst.)

Currying

Besides functions that take elements of tuple types as arguments, another way of defining functions that take more than one argument is currying (named after the logician Haskell Curry, who is also honoured in the name of the programming language). The curried definition of add would look like this:

-- function to add a pair of numbers

curriedAdd :: Integer -> Integer -> Integer

curriedAdd x y  =  x + y

This definition allows the interpreter to evaluate expressions such as curriedAdd 3 5 (which would evaluate to 8). The spaces in this expression represent the operation of function application, which is a left-associative operator, which means that the expression is to be parsed as

    (curriedAdd 3) 5
But what about the subexpression curriedAdd 3? This is in fact a function, which is applied to the "second argument", 5! In other words, curriedAdd is a function that takes an Integer as argument and returns a function as a result; this resulting function takes an Integer as argument (the "second argument") and returns an Integer (the final result). For example, curriedAdd 3 is a function that takes a number as argument and returns the result of adding 3 to that number. The type of curriedAdd is therefore
    Integer -> (Integer -> Integer)
(The "type operator" -> is right-associative, which is why the brackets are omitted from the type declaration preceding the definition of curriedAdd.)

A note on recursive functions

We note in passing that recursive definitions often make use of pattern-matching. For example, recall that the factorial of a number n is the product of all the numbers up to n, i.e.,

    1 * 2 * 3 * ... * (n - 1) * n
How can we write a function fact that takes a number n and returns its factorial. Obviously, we can compute fact n if we can compute the smaller product
    1 * 2 * 3 * ... * (n - 1)
because multiplying this number by n gives us the number we desire. But this smaller product is the factorial of n-1, i.e., it is fact (n-1), which tells us that
    fact n  =  fact(n-1) * n
if n>1. This gives (almost) an effective definition of the factorial function, since, for example,

fact 4  =>  (fact 3) * 4
        =>  ((fact 2) * 3) * 4
        =>  (((fact 1) * 2) * 3) * 4
        =>  ((((fact 0) * 1) * 2) * 3) * 4
        =>  ??

Here we have to make a decision about fact 0: should it have any value at all? Clearly, the factorial of 1 is 1, but how are we to interpret

    1 * ... * 0
In fact we choose to define
fact 0 = 1
(there is
a reason for this choice). Now we can make the following Haskell definition:

-- compute the factorial of a number

fact :: Integer -> Integer

fact 0  =  1
fact n  =  fact (n-1) * n

Here we are defining the function using pattern-matching: there is no formal parameter in the first equation, just the pattern 0, which will only match with the actual parameter 0 itself.

When a function is defined by more than one definition, as fact is, then in order to evaluate a function application, the interpreter will go through each definition in order and check whether the actual parameter matches with the pattern in the definition; when it finds a definition with a pattern that matches, that definition will be applied. This means that when fact is applied to 0, the first equation will be applied; for all other arguments, the second equation will be applied. Thus the complete evaluation of fact 4 is:

fact 4  =>  (fact 3) * 4
        =>  ((fact 2) * 3) * 4
        =>  (((fact 1) * 2) * 3) * 4
        =>  ((((fact 0) * 1) * 2) * 3) * 4
        =>  (((1 * 1) * 2) * 3) * 4)
        =>  24

Similarly, the following definition of the "Fibonnacci function" uses pattern matching in the first two equations:

-- compute the nth value in the Fibonnacci sequence
-- (counting from 0)

fib :: Integer -> Integer

fib 0  =  1
fib 1  =  1
fib n  =  fib (n-2) + fib (n-1)


Guards

Suppose we want to define a function

    maxOf2 :: Integer -> Integer -> Integer
that takes two numbers as arguments and returns the larger of the two numbers as result. We would want to write something like
    maxOf2 x y  =  x    if x >= y
    maxOf2 x y  =  y    if y >= x
Haskell allows such definitions, with the following syntax:

maxOf2 x y  
 | x >= y    = x
 | y >= x    = y

The general form of such definitions is

    <FunctionName> <Parameters>
      | <Test1>     = <Exp1>
      | <Test2>     = <Exp2>
        ...             ...
      | <Testn>     = <Expn>
      | otherwise   = <Exp>
where each of the <Test>s is a boolean expression (in which the parameters may be used), and each of the <Exp> is an expression representing the result if the corresponding <Test> evaluates to True. The clause with the keyword otherwise is optional (it specifies the result when none of the <Test>s evaluates to True). In the definition of maxOf2 above, note that if the two actual parameters are the same, then both the tests evaluate t True; in this case, the expression associated with the first clause is returned.

Other possible definitions include:

maxOf2 x y  
 | x >= y     = x
 | otherwise  = y
maxOf2 x y
 | x >= y    = x
 | x < y     = y

Note that, in all of these definitions, the indentation (before the |) is necessary by the offside rule.


Local Definitions

Haskell also supports local definitions within function definitions. These can be used to introduce aliases for expressions to enhance clarity, e.g.,

sumSquares :: Integer -> Integer -> Integer

sumSquares x y
 =  sqX + sqY
 where
 sqX = x * x
 sqY = y * y

The definitions following the keyword where are local: if they are used in an expression typed at the interpreter's prompt, the interpreter will report an error in the use of undefined variables.

In this example, the local definitions consisted of local constants sqX and sqY. It is also possible to use patterns in local definitions, for example, to match with pairs:

-- compute the nth value in the Fibonnacci sequence
-- (counting from 0)

fib :: Integer -> Integer

fib n  =  f1
  where 
  (f1,f2) = fibs n


-- function to compute the nth and (n+1)th values
-- in the Fibonnacci sequence

fibs :: Integer -> (Integer,Integer)

fibs 0  =  (1,1)
fibs n  =  (f2, f1+f2)
  where
  (f1,f2) = fibs (n-1)

Finally, functions can be defined locally:

-- compute the nth value in the Fibonnacci sequence
-- (counting from 0)

fib :: Integer -> Integer

fib n  =  f1
  where 
  (f1,f2) = fibs n
  fibs 0  =  (1,1)
  fibs n  =  (f2, f1+f2)
     where
     (f1,f2) = fibs (n-1)

When the file is loaded into the interpreter, the user can use the function fib in expressions, but not the function fibs, which is effectively hidden in the local definition. Note that the above example also illustrates nested local definitions, as well as scope: in the local definition of fibe, the names f1 and f2 are redefined.


Commands

A list of the commands recognised by the hugs interpreter can be obtained by typing :? at the hugs prompt. Among the more useful are:

Most of these commands can be abbreviated to just the colon and first letter; for example, :q will quit the interpreter, and :t True will give the type of the constant True (which is Bool).


 
Grant Malcolm
Last modified: Mon Nov 12 17:16:08 GMT 2001