COMP 317: Semantics of Programming Languages

Semantics of Data Types


So far, the only data type in our programming language is Int, the type of integers. Almost all imperative programming languages offer more data types; usually, these include arrays, and some languages also support linked lists (e.g., through pointers), trees, hash tables, and stacks. Some languages even allow programmers to define their own data types. Object-oriented languages allow programmers to define classes, which typically consist of local state (fields) together with methods. We might say that a class is an implementation of an abstract data type, since it gives a set of values (instances of the class), with operations for manipulating these (the methods). In this section of the module, we'll look at how we can include various data types in our programming language, and how we can give a semantics to these data types. In doing so, we'll sketch how to give a semantics to object-oriented programming languages, since we view a class as an implementation of an abstract data type.

We begin by looking at arrays, then turn to stacks, and as a final example we look at a class of two-dimensional points. In each case, we follow the same pattern:

  1. Introduce the syntax that allows us to write programs that use the data type; this syntax includes
  2. Specify an abstract data type that will provide the semantics of the data type. The elements of the abstract data type will be the values that a Store associates with a variable: just as a Store associates an integer number with a Variable, so it will associate an element of an abstract data type of arrays with an array variable, a stack with a stack variable, and so on.
  3. Give equations that define the behaviour of the new programming constructs, and of the new ways of forming Expressions and BooleanExpressions.



Arrays

Almost all imperative languages have arrays. We want to extend our language so we can write programs such as:

    a[0] := 3 ; a[1] := a[0] + 1 ; a[2] := a[0] + a[1]
which should result in the array variable a storing the values 3, 4 and 7 in components 0, 1 and 2, respectively.

Note that, to keep things simple, we won't bother declaring array variables or specifying sizes of arrays, so we won't concern ourselves with errors such as accessing and array out of bounds.

Syntax

First of all, we can see that we have a new sort of thing in the syntax of the programming language: array variables. We add to the specification of the syntax of our programming language:

    sort ArrayVariable .
to include array variables such as a in the example above. Particular array variables can be included by declaring constants of sort ArrayVariable, for example
    ops  a a1 a2  : -> ArrayVariable .
This gives us a programming language with just three array variables (this makes our language rather unrealistic, but has the benefit of simplicity: we want to concentrate on the semantics of languages with arrays, rather than worry about how many array variables are available to the programmer).

We can see from the example program above that the only other piece of syntax we need concern ourselves with is accessing and assigning to individual components of the array. To access a component, let's adopt the syntax _[_] which takes an array variable as first argument, and an expression as second argument. What about the return type of this syntactic operator? Note that array components serve two purposes: as a component that can be assigned to; for example:

    a[0] := 2 * 'x
and as an expression denoting the contents of the component:
    'y := 3 * a[1]
For the first of these, let's introduce a new sort, ArrayComponent. If we make this a subsort of Assignable, then we can assign to array components using the existing operation
    _:=_ : Assignable Expression -> BasicProgram .
Now our syntax is:
    fmod ARRAY-PGM is

      extending PROGRAM .

      sorts ArrayVariable ArrayComponent .
      subsort ArrayComponent < Assignable .

      ops a a1 a2 : -> ArrayVariable .

      op _[_] : ArrayVariable Expression -> ArrayComponent .

    endfm
And we're done.

Semantics

For the semantic domain, we need to specify what sort of thing will be stored by an array variable (ArrayVariable): this will be an abstract data type (ADT) of Arrays. Since the syntax of our programming languages allows only accessing and assigning to array components, these are the only operations we need in our ADT. Effectively, arrays will simply be tables, storing integer values in "components" that are indexed by integers, with operations for "table look-up" and "table update". For table look-up, we use the same square-brackets notation as in the programming language, and for table update, we write A[I <- J] for the result of storing the value J in the array A at index I. We'll also add a constant, zeros, representing the list where every element is 0.

    fmod ARRAY is

      protecting ZZ .

      sort Array .

      op zeros : -> Array .

      op _[_] : Array Int -> Int .
      op _[_<-_] : Array Int Int -> Array .

      var A : Array .
      vars I J K : Int .

      eq  zeros[I]  =  0 .

      eq  (A[ I <- J ])[I]  =  J .
      cq  (A[ I <- J ])[K]  =  A[K]   if  I =/= K .

    endfm

For the semantics, we need to say that an array variable (ArrayVariable) in the programming language stores an array (Array), and that components of the array can be accessed using table look-up, and that assignment updates the array. We'll also say that in the initial state, every array variable stores zeros, with every component set to 0.

    th ARRAY-SEMANTICS is extending SEMANTICS .
                          protecting ARRAY .

      op _[[_]] : Store ArrayVariable -> Array .

      var  S : Store .
      vars AV AV' : ArrayVariable .
      vars E E; : Expression .
      var  V : Variable .

      eq  initial [[ AV ]]  =  zeros .

      eq  S[[ AV[E] ]]  =  (S[[AV]])[ S[[E]] ] .

      eq  S ; AV[E] := E' [[ AV ]]  =  (S[[AV]])[ S[[E]] <- S[[E']] ] .

      cq  S ; AV[E] := E' [[ AV' ]]  =  S[[AV']]   if  AV =/= AV' .

      eq  S ; AV[E] := E' [[ V ]]  =  S[[V]] .
      eq  S ; V := E [[ AV ]]  =  S[[AV]] .

    endth
Note that the last two equations say that assignment to variables (Variables) does not affect array variables, and that assigning to array components does not affect Variables.



Stacks

Syntax

    fmod STACK-PGM is

      extending PROGRAM .

      sort StackVariable .

      ops st st1 st2  : -> StackVariable .

      op  _.empty()   : StackVariable     -> BasicProgram .
      op  _.push(_)   : StackVariable Expression -> BasicProgram .
      op  _.pop()     : StackVariable     -> BasicProgram .
      op  _.top()     : StackVariable     -> Expression .
      op  _.isEmpty() : StackVariable     -> BooleanExpression .

    endfm

Semantics

    fmod STACK is

      protecting ZZ .

      sort Stack .

      op  empty   :           -> Stack .
      op  push    : Stack Int -> Stack .
      op  pop     : Stack     -> Stack .
      op  top     : Stack     -> Int .
      op  isEmpty : Stack     -> Bool .

      var  ST : Stack .
      var  I  : Int .

      eq  isEmpty(ST)  =  ST == empty .

      eq  pop(push(ST, I))  =  ST .
      eq  top(push(ST, I))  =  I .

      eq  pop(empty)  =  empty .
      eq  top(empty)  =  0 .

    endfm

    th STACK-SEMANTICS is extending SEMANTICS .
                          prOTECTING STACK .

      op  _[[_]]  : Store StackVariable -> Stack .

      var  S : Store .
      vars SV SV' : StackVariable .
      var  E : Expression .
      var  V : Variable .

      eq  S[[ SV.isEmpty() ]]  =  isEmpty(S[[SV]]) .

      eq  S[[ SV.top() ]]  =  top(S[[SV]]) .

      eq  S ; SV.push(E) [[ SV ]]  =  push(S[[SV]], S[[E]]) .
      cq  S ; SV.push(E) [[ SV']]  =  S[[SV']]    if  SV =/= SV' .
      eq  S ; SV.push(E) [[ V ]]   =  S[[V]] .

      eq  S ; SV.pop() [[ SV ]]  =  pop(S[[SV]]) .
      cq  S ; SV.pop() [[ SV']]  =  S[[SV']]   if  SV =/= SV' .
      eq  S ; SV.pop() [[ V ]]   =  S[[V]] .

      eq  S ; SV.empty() [[ SV ]]  =  empty .
      cq  S ; SV.empty() [[ SV']]  =  S[[SV']]   if  SV =/= SV' .
      eq  S ; SV.empty() [[ V ]]   =  S[[V]] .

      eq  S ; V := E [[ SV ]]  =  S[[SV]] .

    endth



Points

Consider the following Java class definition:

    class Point
    {
       private int xCoord;
       private int yCoord;

       Point(int x, int y)
       {
          xCoord = x;
          yCoord = y;
       }

       public int getX()
       {
          return xCoord;
       }

       public int getY()
       {
          return yCoord;
       }

       public void move(int dx, int dy)
       {
          xCoord += dx;
          yCoord += dy;
       }
    }
In Java, we can use this class declaration to write programs such as
    Point p = new Point(12,24);
    p.move(3,6);
which ends with the point p at x-coordinate 15 and y-coordinate 30.

Syntax

We can write similar programs in our programming language by extending its syntax with point variables, and operations corresponding to the Point constructor and the three methods in the class definition.

    fmod POINT-PGM is extending PROGRAM .

      sort PointVariable .

      ops  p p1 p2 : -> PointVariable .

      op  _.getX() : PointVariable -> Expression .
      op  _.getY() : PointVariable -> Expression .

      op  _.new(_,_) : PointVariable Expression Expression -> BasicProgram .

      op  _.move(_,_) : PointVariable Expression Expression -> BasicProgram .

    endfm

Semantics

To define the semantics of this extension to our language, we first need an abstract data type of points, which is essentially just pairs of integers:

    fmod POINT is protecting ZZ .

      sort Point .

      op  Pt[_,_] : Int Int -> Point .

      op  _.x : Point -> Int .
      op  _.y : Point -> Int .

      vars I J : Int .

      eq  (Pt[I,J]).x  =  I .
      eq  (Pt[I,J]).y  =  J .
      
    endfm
I.e., pairs are made by yoking two integers together. The usual notation for this is (I,J); here we write such a pair as Pt[I, J] (though of course we could introduce any notation we liked).

Now the program operation _.getX() just gives the first component of a pair, and _.getY() the second component; _.new(_,_) makes a new pair of the given integers, and _.move(_,_) adds the given integers to the first and second components, respectively, of a pair. This is stated formally in the following specification of the semantics of "point programs":

    th POINT-SEMANTICS is extending SEMANTICS .
                          protecting POINT .

      op  _[[_]] : Store PointVariable -> Point .

      var  S : Store .
      vars PV PV' : PointVariable .
      vars E1 E2 : Expression .
      var  V : Variable .

      eq  S[[ PV.getX() ]]  =  (S[[PV]]).x .
      eq  S[[ PV.getY() ]]  =  (S[[PV]]).y .

      eq  S ; PV.new(E1,E2) [[ PV ]]  =  Pt[ S[[E1]], S[[E2]] ] .
      cq  S ; PV.new(E1,E2) [[ PV']]  =  S[[PV']]   if  PV =/= PV' .
      eq  S ; PV.new(E1,E2) [[ V ]]   =  S[[V]] .

      eq  S ; PV.move(E1,E2) [[ PV ]]
           =  Pt[ (S[[SV]]).x + (S[[E1]]), (S[[SV]]).y + (S[[E2]]) ] .
      cq  S ; PV.move(E1,E2) [[ PV']]  =  S[[PV']]   if  PV =/= PV' .
      eq  S ; PV.move(E1,E2) [[ V ]]   =  S[[V]] .

    endth



Exercises

  1. Suppose we wish to add Boolean variables to our programming language, so that we could write programs such as the following, which searches an array to see if (the value of) 'e occurs in the first 100 components of an array, and uses a Boolean variable found:
      'i := 0 ;
      found = false ;
      while ('i < 100) and (not found)
      do
         if  a['i] is 'e
         then  found = true
         else  'i := 'i + 1
         fi
      od
    
    Extend the programming language with two Boolean variables found and done, and an assignment operator for Boolean variables. Then extend the semantics of the language to describe the semantics of assignment to Boolean variables.
     
  2. In some programming languages that support stacks, the pop operation returns the value that is on the top of the stack (so it's like top in our example above), and also has the effect of removing the top element (so it's also like pop in our example above). One way of implementing this kind of pop operation is to supply it with a Variable as argument. Thus, for example, the following program adds up all the values in the stack variable st:
        'sum := 0 ;
        while not st.isEmpty()
        do
           st.pop('x) ;
           s'um := 'sum + 'x
        od
    
    Here, st.pop('x) pops the stack stored in st, and assigns the value of the top element in the stack to 'x.
    Modify the modules STACK-PGM and STACK-SEMANTICS above to implement this pop operation (the module STACK remains unchanged).
     
  3. Queues are first-in-first-out lists of numbers, with operations to add a number to the tail of the queue, to get the number at the head of the queue, and to remove the first element in the queue, as well as an operation to test if the queue is empty. An abstract data type of queues is specified as follows:
        fmod QUEUE is protecting ZZ .
    
          sort Queue .
    
          op  empty : -> Queue .
          op  add : Queue Int -> Queue .
          op  get : Queue -> Int .
          op  removeHead : Queue -> Queue .
          op  isEmpty : Queue -> Bool .
    
          var  Q : Queue .
          vars I J : Int .
    
          cq  get(add(Q,I))  =  I       if  isEmpty(Q) .
          cq  get(add(Q,I))  =  get(Q)  if  not isEmpty(Q) .
          eq  get(empty)  =  0 .
    
          cq  removeHead(add(Q,I))  =  empty   if  isEmpty(Q) .
          cq  removeHead(add(Q,I))  =  add(removeHead(Q), I)   
                                               if  not isEmpty(Q) .
          eq  removeHead(empty)  =  empty .   
    
          eq  isEmpty(Q)  =  Q == empty .
    
        endfm
    
    Suppose we want to add queues to our programming language, with just one queue variable, q, and operations: For example, the following program adds the values 5 then 2 to the queue:
        q.add(5) ; q.add(2)
    
    and the following program subtracts the second element in the queue from the first element and stores the result in 'd:
        q.get('x) ; q.get('y) ; 'd := 'x - 'y
    
    For example, if the queue contained the elements 5 then 2, in that order, then the program above sets 'd to 3.
    Extend the syntax of the programming language with these operations, then (using QUEUE), give a semantics for these operations.
     
  4. Do the linked-list example from the 2004 exam paper.



Grant Malcolm

Last modified: Tue Sep 17 22:16:19 BST 2002