How are programming languages constructed

Constructed Types[edit]

Programming languages usually provide developers with very few primitive types. If programmers want to increase the available types, then they can define new composite types. If a type is a set, then a constructed type is just a new set built from other types.


Enumerations are the simplest kind of primitive types. In some programming languages enumerations are a subset of another type. In others, enumerations are an independent set of elements. In C we have the former, and in SML we have the latter way to define enumerations. The program below illustrates how enumerations are used in :


As we can see in this figure, 's enumerations are a subset of the integers. Thus, any operation that can be applied on an integer can also be applied on elements of an enumeration. These enumerations are not even closed sets. In other words, it is possible to sum up two elements of the same enumeration, and to get a result that is not part of that type, e.g., is not in the enumeration, in our previous example.

In some programming languages enumerations are not a subset of an existing type. Instead, they are a set on its own. In SML, for example, enumerations are defined in this way:


An operation such as , for instance, will not type check in SML, as enumerations are not integers. The only operation that is possible, in this case, is comparison, as we showed in the implementation of the function .


Tuples give us one of the most important kinds of constructed types. Conceptually, tuples are cartesian products of other types. An -tuple is a sequence of ordered elements. If a tuple is the product of two types, and , then has as many elements as , where and are the cardinality of the sets and . As an example, below we have the declaration of a type in SML:


The typical operation that we can apply on tuples is indexing its elements. In SML we index tuples via the operator:


Tuples are very useful as a quick way to build aggregate data structures. These types are the only alternative that many programming languages provide to build functions that return multiple elements. For instance, the function below, written in Python, outputs, for each actual parameter, a pair with the parameter itself, and its ceil, i.e., its value rounded up:


In SML, as well as in any language that follows the lambda-calculus more closely, every function receives only one parameter. Tuples provide, in this case, an alternative to simulate the invocation of the function with multiple parameters:



Some programming languages support tuples with named components. Tuples with named components are commonly known as record or structure types:


The C language has a construction named struct that can be used to represent tuples. The next example illustrates its use.


The complex structure represents complex numbers with its real and imaginary parts. Each element of the has a name. Moreover, each element of a is associated with a type, which has an internal representation. Some languages, such as SML, hide this representation from the programmer. However, there are languages, such as C, that make this representation visible to the programmer. In this case, the language specification allows the programmer to tell exactly how a tuple is represented. It defines which parts of the representation must be the same in all C systems and which can be different in different implementation. Because the visibility of the internal representation of records in C, it is possible to create a program that iterates over the fields of a tuple. An an example, the program below defines a -tuple () and iterates over its elements using pointer manipulation.

#include<stdio.h>structmySt{inti1,i2,i3,i4,sentinel;};intmain(intargc,char**argv){structmySts;int*p1,*p4;s.i1=1;s.i2=2;s.i3=3;s.i4=4;p1=((int*)&s);p4=&(s.sentinel);do{printf("field = %d\n",*p1);p1++;}while(p1!=p4);}


Lists are one of the most ubiquitous constructed types in functional programming languages. Lists have two important differences when compared to tuples:

  • The type declaration does not predefine a number of elements.
  • All the elements in a list must have the same type.

Additionally, the only operations that are generally allowed on lists are:

  • To read the head element.
  • To read the tail of the list.

The program below illustrates the use of a lists in SML:



Vectors are containers that store data having the same data type. The elements stored into an array are indexed. Many programming languages use integer numbers as indexes while others are more flexible and permit arrays to be indexed using a variety of types, such as integers, characters, enumerations, and so on. There are programming languages, such as Pascal, that give the developer the chance of choosing the lowest and highest indexes of an array. As an example, the code snippet below, written in Pascal, could be used to count the frequency of letters that occur in a document:


The approach adopted in Pascal makes it unnecessary to define the array size, which is automatically defined by the compiler. Contrary to Pascal, every array in C is indexed starting in zero. The array above, if written in C, would be like:

In some programming languages arrays have fixed size. Once they are declared, this size cannot be changed. However, there exist programming languages that provide arrays of variable length. Python is an example:


Some programming languages support multi-dimensional arrays. C is an example. The program below declares a 2-dimensional array, fills it up with integer numbers and then finds the sum of every cell:

#include"stdio.h"intmain(){constintM=5000;constintN=1000;charm[M]7976;inti,j;intsum=0;// Initializes the array:for(i=0;i<M;i++){for(j=0;j<N;j++){m[i][j]=(i+j)&7;}}// Sums up the matrix elements, row major:for(i=0;i<M;i++){for(j=0;j<N;j++){sum+=m[i][j];}}printf("The sum is %d\n",sum);}

Contrary to lists, arrays provide developers with random accesses. That is, it is possible to read or update the element of an array in . Notice that this operation is not usually allowed on lists. In that case, the operation of reading the element is . Usually this restriction does not limit the efficiency of the algorithms that can be developed in array-free languages; however, there are some algorithms that cannot be implemented so efficiently in these languages. An example is the problem of determining if two strings are anagrams. In a programming language that provides random-access arrays, it is possible to implement this function in , where is the size of the largest string. Otherwise, the most efficient implementation will be . The program below, written in Java, solves the anagram problem in linear time:

publicclassAnagrams{publicstaticvoidmain(Stringargs[]){if(args.length!=2){System.err.println("Syntax: java Anagrams s1 s2");}else{booleanyes=true;Strings1=args[0];Strings2=args[1];if(s1.length()!=s2.length()){yes=false;}else{inttable[]=newint[128];for(inti=0;i<s1.length();i++){table[s1.charAt(i)]++;}for(inti=0;i<s2.length();i++){if(table[s2.charAt(i)]==0){yes=false;break;}else{table[s2.charAt(i)]--;}}}System.out.println("Are they anagrams? "+yes);}}}

Associative Arrays[edit]

Some programming languages provide a special type of array, the dictionary, also known as the associative array, that allows one to map arbitrary objects to arbitrary values. Associative arrays are more common in dynamically typed languages. The program below, written in Python, illustrate the use of dictionaries:

>>>d={"Name":"Persival","Age":31}>>>d["Name"]'Persival'>>>d["Gender"]="Male";>>>d{'Gender':'Male','Age':31,'Name':'Persival'}>>>d[0]="Secret Info">>>d{0:'Secret Info','Gender':'Male','Age':31,'Name':'Persival'}


Classes are a special type of table that contains a reference to itself. In other words, a class is a type that "sees" itself. In terms of implementation, classes can be implemented as records, or as associative arrays. In Java or C++, for instance, classes are implemented as records, with a fixed number of elements. The program below shows an example of the implementation of a class:

publicclassConsCell<E>{privateEhead;privateConsCell<E>tail;publicConsCell(Eh,ConsCell<E>t){head=h;tail=t;}publicEgetHead(){returnhead;}publicConsCell<E>getTail(){returntail;}publicvoidprint(){System.out.println(head);if(tail!=null){tail.print();}}publicintlength(){if(tail==null){return1;}else{return1+tail.length();}}publicstaticvoidmain(Stringargc[]){ConsCell<Integer>c1=newConsCell<Integer>(1,null);ConsCell<Integer>c2=newConsCell<Integer>(2,c1);ConsCell<Integer>c3=newConsCell<Integer>(3,c2);ConsCell<Integer>c4=newConsCell<Integer>(4,c3);System.out.println(c4.length());System.out.println("Printing c4");c4.print();System.out.println("Printing c2");c2.print();}}

In Java, the layout of a class cannot be modified. Once it is declared, new fields cannot be added. This restriction makes it easier to implement classes in this language efficiently. Calling a function that is declared within a class, also known as a "method", is an efficient operation. The target of a call can be found in time. Generally statically typed languages, such as C++, C# and Ocaml.

The other way to implement class is as associative arrays. Dynamically typed languages, such as Python, JavaScript and Lua implement classes in this way. The main advantage of this approach is flexibility: given that the class is a mutable table, new properties can be added to it at runtime, or removed from it. The program below shows how classes can be declared and used in Python:


Classes are not rigid structures in Python. Thus, it is possible to either insert or remove new properties into these types. The program below illustrates how a new method can be added to our original Python example. The new method handles the possibility than an element not supported by the range of the bit set is added to this bit set:

deferrorAdd(self,element):if(element>self.capacity):raiseIndexError(str(element)+" is out of range.")else:(index,bit)=getIndex(element)self.vector[index]|=bitprintelement,"added successfully!"Set.errorAdd=errorAdds=Set(60)s.errorAdd(59)s.add(60)s.errorAdd(61)


The union of many sets is the set of all elements that are in at least one of the base sets. The symbol is used to express this concept . In programming languages, this notion of union is applied to types: the union of several types is a new type that contains every element in each of the base types. Some programming languages force the developer to explicitly distinguish each base type from the other, when using the union. Other languages, such as C, trust the programmer to use correctly each element of the union. The code bellow shows how unions are declared and used in :

#include<stdio.h>unionelement{inti;floatf;};intmain(){unionelemente;e.f=177689982.02993F;printf("Int = %d\n",e.i);printf("Float = %f\n",e.f);e.f=0.0F;}

This example shows that any of several representations or formats can be associated with one single variable. The code above associates and with the same . So, the union element consists of a variable which may hold an or a . Notice that in the program above we have initialized the field of the union ; however, we have used its field . C does not require the programmer to use the base type as it has been declared in the union. A union, in C, is only a chunk of data stored in memory. How this data is interpreted depends on which operation the programmer wants to apply on it. If a union is made of several base types , then the compiler reserves in memory the area of the largest base type to allocate any instance of .

There are programming languages that force the programmer to explicitly distinguish each base type of a union. SML fits in this family of languages. In SML unions are labeled, and if programmers want to declare a particular subtype of a union, they need to explicitly mention the label:


The labels allow us to know at runtime the type of the data stored into a variable. Because C does not have this concept of labels associated with unions, runtime type inspection is not possible in that language. Labels also ensure that all the sets in a union are disjoint. Therefore, the cardinality of a union is the sum of the cardinalities of each base set .

Function Types[edit]

Functions map a given set, the domain, into another set, the range (or image). Hence, the type of a function specifies the function's domain and range. This idea is analogous to the mathematical notation which refers to the set of all function with inputs from the set and outputs from the set . Many programming languages have this kind of construction. Below we show an example of function declaration in . The domain of the function is a pair of real numbers and the image is a real number, e.g., .


← Primitive Types

← Primitive Types · Introduction to Programming Languages

It refers to the ability of a variable, function or object to take on multiple forms