.NH 2 Representation of complex data structures in a sequential file .PP Most programmers are quite used to deal with complex data structures, such as arrays, graphs and trees. There are some particular problems that occur when storing such a data structure in a sequential file. We call data that is kept in main memory .UL internal ,as opposed to .UL external data that is kept in a file outside the program. .sp We assume a simple data structure of a scalar type (integer, floating point number) has some known external representation. An .UL array having elements of a scalar type can be represented externally easily, by successively representing its elements. The external representation may be preceded by a number, giving the length of the array. Now, consider a linear, singly linked list, the elements of which look like: .DS record data: scalar_type; next: pointer_type; end; .DE It is significant to note that the "next" fields of the elements only have a meaning within main memory. The field contains the address of some location in main memory. If a list element is written to a file in some program, and read by another program, the element will be allocated at a different address in main memory. Hence this address value is completely useless outside the program. .sp One may represent the list by ignoring these "next" fields and storing the data items in the order they are linked. The "next" fields are represented \fIimplicitly\fR. When the file is read again, the same list can be reconstructed. In order to know where the external representation of the list ends, it may be useful to put the length of the list in front of it. .sp Note that arrays and linear lists have the same external representation. .PP A doubly linked, linear list, with elements of the type: .DS record data: scalar_type; next, previous: pointer_type; end .DE can be represented in precisely the same way. Both the "next" and the "previous" fields are represented implicitly. .PP Next, consider a binary tree, the nodes of which have type: .DS record data: scalar_type; left, right: pointer_type; end .DE Such a tree can be represented sequentially, by storing its nodes in some fixed order, e.g. prefix order. A special null data item may be used to denote a missing left or right son. For example, let the scalar type be integer, and let the null item be 0. Then the tree of fig. 3.1(a) can be represented as in fig. 3.1(b). .DS 4 9 12 12 3 4 6 8 1 5 1 Fig. 3.1(a) A binary tree 4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0 Fig. 3.1(b) Its sequential representation .DE We are still able to represent the pointer fields ("left" and "right") implicitly. .PP Finally, consider a general .UL graph , where each node has a "data" field and pointer fields, with no restriction on where they may point to. Now we're at the end of our tale. There is no way to represent the pointers implicitly, like we did with lists and trees. In order to represent them explicitly, we use the following scheme. Every node gets an extra field, containing some unique number that identifies the node. We call this number its .UL id. A pointer is represented externally as the id of the node it points to. When reading the file we use a table that maps an id to the address of its node. In general this table will not be completely filled in until we have read the entire external representation of the graph and allocated internal memory locations for every node. Hence we cannot reconstruct the graph in one scan. That is, there may be some pointers from node A to B, where B is placed after A in the sequential file than A. When we read the node of A we cannot map the id of B to the address of node B, as we have not yet allocated node B. We can overcome this problem if the size of every node is known in advance. In this case we can allocate memory for a node on first reference. Else, the mapping from id to pointer cannot be done while reading nodes. The mapping can be done either in an extra scan or at every reference to the node.