2. Basic concepts:
2.1. Assignment commands
One of the characteristic features of computers is that they have a store into which it is possible to put information and from which it can subsequently be recovered. Furthermore the act of inserting an item into the store erases whatever was in that particular area of the store before—in other words the process is one of overwriting. This leads to the assignment command which is a prominent feature of most programming languages.
The simplest forms of assignments such as
ð x := 3
ð x := y + 1
ð x := x + 1
Lend themselves to very simple explications. ‘Set x equal to 3’, ‘Set x to be the value of y plus 1’ or ‘Add one to x’. But this simplicity is deceptive; the examples are themselves special cases of a more general form and the first explications which come to mind will not generalize satisfactorily. This situation crops up over and over again in the exploration of a new field; it is important to resist the temptation to start with a confusingly simple example.
The following assignment commands show this danger.
i := a > b → j,k ( .See note 1)
A[i] := A[a > b → j,k]
A[a > b→ j, k] := A[i]
a > b→ j, k := I ( .See note 2)
All these commands are legal in CPL (and all but the last, apart from minor syntactic alterations, in ALGOL also). They show an increasing complexity of the expressions written on the left of the assignment. We are tempted to write them all in the general form
ἐ1 := ἐ2
where ἐ1 and ἐ2 stand for expressions, and to try as an explication something like ‘evaluate the two expressions and then do the assignment’. But this clearly will not do, as the meaning of an expression (and a name or identifier is only a simple case of an expression) on the left of an assignment is clearly different from its meaning on the right. Roughly speaking an expression on the left stands for an ‘address’ and one on the right for a ‘value’ which will be stored there. We shall therefore accept this view and say that there are two values associated with an expression or identifier. In order to avoid the overtones which go with the word ‘address’ we shall give these two values the neutral names: L-value for the address-like object appropriate on the left of an assignment, and R-value for the contents-like object appropriate for the right.
2.2. L-values and R-values:
An L-value represents an area of the store of the computer. We call this a location rather than an address in order to avoid confusion with the normal store-addressing mechanism of the computer. There is no reason why a location should be exactly one machine-word in size— the objects discussed in programming languages may be, like complex or multiple precision numbers, more than one word long, or, like characters, less. Some locations are addressable (in which case their numerical machine address may be a good representation) but some are not. Before we can decide what sort of representation a general, non-addressable location should have, we should consider what properties we require of it.
The two essential features of a location are that it has a content—i.e. an associated R-value—and that it is in general possible to change this content by a suitable updating operation. These two operations are sufficient to characterise a general location which are consequently sometimes known as ‘Load-Update Pairs’ or LUPs. They will be discussed
again in (Section 4.1)
In CPL a programmer can introduce a new quantity and give it a value by an initialized definition such as
let p = 3.5 (In ALGOL this would be done by real p; p := 3.5;).
This introduces a new use of the name p (ALGOL uses the term ‘identifier’ instead of name), and the best way of looking at this is that the activation of the definition causes a new location not previously used to be set up as the L-value of p and that the R-value 3.5 is then assigned to this location. The relationship between a name and its L-value cannot be altered by assignment, and it is this fact which makes the L-value important. However in both ALGOL and CPL one name can have several different L-values in different parts of the program. It is the concept of scope (sometimes called lexicographical scope) which is controlled by the block structure which allows us to determine at any point which L-value is relevant. In CPL, but not in ALGOL, it is also possible to have several names with the same L-value. This is done by using a special form of definition:
Let, q ͠= P
which has the effect of giving the name of the same L-value as p (which must already exist). This feature is generally used when the right side of the definition is a more complicated expression than a simple name. Thus if M is a matrix, the definition
let x ͠= M[2,2]
gives x the same L-value as one of the elements of the matrix. It is then said to be sharing with M[2,2], and an assignment to x will have the same effect as one to M[2,2]. It is worth noting that the expression on the right of this form of definition is evaluated in the L-mode to get an L-value at the time the definition is obeyed. It is this L-value which is associated with x. Thus if we have
let i = 2
let x ͠= M[i,i]
the L-value of x will remain that of M[2,2]. M[i,i] is an example of an anonymous quantity i.e., an expression rather than a simple name—which has both an L-value and an R-value. There are other expressions, such as a+b, which only have R-values. In both cases the expression has no name as such although it does have either one value or two.
It is important to be clear about this as a good deal of confusion can be caused by differing uses of the terms. ALGOL 60 uses ‘identifier’ where we have used ‘name’, and reserves the word ‘name’ for a wholly different use concerned with the mode of calling parameters for a procedure. (See Section 3.4.3.) ALGOL X, on the other hand, appears likely to use the word ‘name’ to mean approximately what we should call an L-value, (and hence something which is a location or generalised address). The term reference is also used by several languages to mean (again approximately) an L-value. It seems to me wiser not to make a distinction between the meaning of ‘name’ and that of ‘identifier’ and I shall use them interchangeably. The important feature of a name is that it has no internal structure at any rate in the context in which we are using it as a name. Names are thus atomic objects and the only thing we know about them is that given two names it is always possible to determine whether they are equal (i.e., the same name) or not.
We use theword ‘number’ for the abstract object and ‘numeral’ for its written representation.Thus 24 and XXIV are two different numerals representing the same number. There is often some confusion about the status of numerals in programming languages. One view commonly expressed is that numerals are the ‘names of numbers’ which presumably means that every distinguishable numeral has an appropriate R-value associated with it. This seems
to me an artificial point of view and one which falls foul of Occam’s razor by unnecessarily multiplying the number of entities (in this case names). This is because it overlooks the important fact that numerals in general do have an internal structure and are therefore not atomic in the sense that we said names were in the last section. An interpretation more in keeping with our general approach is to regard numerals as R-value expressions written according to special rules. Thus for example the numeral 253 is a syntactic variant for the expression
2*102 C+ 5 * 10 + 3
while the CPL constant 8 253 is a variant of
2 * 82 + 5* 8 +3
Local rules for special forms of expression can be regarded as a sort of ‘micro-syntax’ and form an important feature of programming languages. The micro-syntax is frequently used in a preliminary ‘pre-processing’ or ‘lexical’ pass of compilers to deal with the recognition of names, numerals, strings, basic symbols (e.g. boldface words in ALGOL) and similarobjects which are represented in the input stream by strings of symbols in spite of beingatomic inside the language.With this interpretation the only numerals which are also names are the single digits and these are, of course, constants with the appropriate R-value.
2.6. Conceptual model
It is sometimes helpful to have a picture showing the relationships between the various objects in the programming language, their representations in the store of a computer and the abstract objects to which they correspond. Figure 1 is an attempt to portray the conceptual model which is being used in this course.
On the left are some of the components of the programming language. Many of these correspond to either an L-value or an R-value and the correspondence is indicated by an arrow terminating on the value concerned. Both L-values and R-values are in the idealized store, a location being represented by a box and its contents by a dot inside it. R-values without corresponding L-values are represented by dots without boxes, and R-values which are themselves locations (as, for example, that of a vector) are given arrows which terminate on another box in the idealised store. R-values which correspond to numbers are given arrows which terminate in the right hand part of the diagram which represents the abstract objects with which the program deals. The bottom section of the diagram, which is concerned with vectors and vector elements will be more easily understood after reading the section on compound data structures. (Section 3.7.)