150 lines
		
	
	
	
		
			4.2 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			150 lines
		
	
	
	
		
			4.2 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.NH 2
 | 
						|
Representation of complex data structures in a sequential file
 | 
						|
.PP
 | 
						|
Most programmers are quite used to deal with
 | 
						|
complex data structures, such as
 | 
						|
arrays, graphs and trees.
 | 
						|
There are some particular problems that occur
 | 
						|
when storing such a data structure
 | 
						|
in a sequential file.
 | 
						|
We call data that is kept in
 | 
						|
main memory
 | 
						|
.UL internal
 | 
						|
,as opposed to
 | 
						|
.UL external
 | 
						|
data
 | 
						|
that is kept in a file outside the program.
 | 
						|
.sp
 | 
						|
We assume a simple data structure of a
 | 
						|
scalar type (integer, floating point number)
 | 
						|
has some known external representation.
 | 
						|
An
 | 
						|
.UL array
 | 
						|
having elements of a scalar type can be represented
 | 
						|
externally easily, by successively
 | 
						|
representing its elements.
 | 
						|
The external representation may be preceded by a
 | 
						|
number, giving the length of the array.
 | 
						|
Now, consider a linear, singly linked list,
 | 
						|
the elements of which look like:
 | 
						|
.DS
 | 
						|
record
 | 
						|
        data: scalar_type;
 | 
						|
        next: pointer_type;
 | 
						|
end;
 | 
						|
.DE
 | 
						|
It is significant to note that the "next"
 | 
						|
fields of the elements only have a meaning within
 | 
						|
main memory.
 | 
						|
The field contains the address of some location in
 | 
						|
main memory.
 | 
						|
If a list element is written to a file in
 | 
						|
some program,
 | 
						|
and read by another program,
 | 
						|
the element will be allocated at a different
 | 
						|
address in main memory.
 | 
						|
Hence this address value is completely
 | 
						|
useless outside the program.
 | 
						|
.sp
 | 
						|
One may represent the list by ignoring these "next" fields
 | 
						|
and storing the data items in the order they are linked.
 | 
						|
The "next" fields are represented \fIimplicitly\fR.
 | 
						|
When the file is read again,
 | 
						|
the same list can be reconstructed.
 | 
						|
In order to know where the external representation of the
 | 
						|
list ends,
 | 
						|
it may be useful to put the length of
 | 
						|
the list in front of it.
 | 
						|
.sp
 | 
						|
Note that arrays and linear lists have the
 | 
						|
same external representation.
 | 
						|
.PP
 | 
						|
A doubly linked, linear list,
 | 
						|
with elements of the type:
 | 
						|
.DS
 | 
						|
record
 | 
						|
        data: scalar_type;
 | 
						|
        next,
 | 
						|
        previous: pointer_type;
 | 
						|
end
 | 
						|
.DE
 | 
						|
can be represented in precisely the same way.
 | 
						|
Both the "next" and the "previous" fields are represented
 | 
						|
implicitly.
 | 
						|
.PP
 | 
						|
Next, consider a binary tree,
 | 
						|
the nodes of which have type:
 | 
						|
.DS
 | 
						|
record
 | 
						|
        data: scalar_type;
 | 
						|
        left,
 | 
						|
        right: pointer_type;
 | 
						|
end
 | 
						|
.DE
 | 
						|
Such a tree can be represented sequentially,
 | 
						|
by storing its nodes in some fixed order, e.g. prefix order.
 | 
						|
A special null data item may be used to
 | 
						|
denote a missing left or right son.
 | 
						|
For example, let the scalar type be integer,
 | 
						|
and let the null item be 0.
 | 
						|
Then the tree of fig. 3.1(a)
 | 
						|
can be represented as in fig. 3.1(b).
 | 
						|
.DS
 | 
						|
.ft 5
 | 
						|
                        4
 | 
						|
                      /   \e
 | 
						|
                    9      12
 | 
						|
                  /  \e    /  \e
 | 
						|
                12    3   4   6
 | 
						|
                     / \e  \e  /
 | 
						|
                     8  1  5 1
 | 
						|
.ft R
 | 
						|
 | 
						|
Fig. 3.1(a) A binary tree
 | 
						|
 | 
						|
 | 
						|
.ft 5
 | 
						|
4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
 | 
						|
.ft R
 | 
						|
 | 
						|
Fig. 3.1(b) Its sequential representation
 | 
						|
.DE
 | 
						|
We are still able to represent the pointer fields ("left"
 | 
						|
and "right") implicitly.
 | 
						|
.PP
 | 
						|
Finally, consider a general
 | 
						|
.UL graph
 | 
						|
, where each node has a "data" field and
 | 
						|
pointer fields,
 | 
						|
with no restriction on where they may point to.
 | 
						|
Now we're at the end of our tale.
 | 
						|
There is no way to represent the pointers implicitly,
 | 
						|
like we did with lists and trees.
 | 
						|
In order to represent them explicitly,
 | 
						|
we use the following scheme.
 | 
						|
Every node gets an extra field,
 | 
						|
containing some unique number that identifies the node.
 | 
						|
We call this number its
 | 
						|
.UL id.
 | 
						|
A pointer is represented externally as the id of the node
 | 
						|
it points to.
 | 
						|
When reading the file we use a table that maps
 | 
						|
an id to the address of its node.
 | 
						|
In general this table will not be completely filled in
 | 
						|
until we have read the entire external representation of
 | 
						|
the graph and allocated internal memory locations for
 | 
						|
every node.
 | 
						|
Hence we cannot reconstruct the graph in one scan.
 | 
						|
That is, there may be some pointers from node A to B,
 | 
						|
where B is placed after A in the sequential file than A.
 | 
						|
When we read the node of A we cannot map the id of B
 | 
						|
to the address of node B,
 | 
						|
as we have not yet allocated node B.
 | 
						|
We can overcome this problem if the size
 | 
						|
of every node is known in advance.
 | 
						|
In this case we can allocate memory for a node
 | 
						|
on first reference.
 | 
						|
Else, the mapping from id to pointer
 | 
						|
cannot be done while reading nodes.
 | 
						|
The mapping can be done either in an extra scan
 | 
						|
or at every reference to the node.
 |