151 lines
4.2 KiB
Plaintext
151 lines
4.2 KiB
Plaintext
.NH 2
|
|
Representation of complex data structures in a sequential file
|
|
.PP
|
|
Most programmers are quite used to deal with
|
|
complex data structures, such as
|
|
arrays, graphs and trees.
|
|
There are some particular problems that occur
|
|
when storing such a data structure
|
|
in a sequential file.
|
|
We call data that is kept in
|
|
main memory
|
|
.UL internal
|
|
,as opposed to
|
|
.UL external
|
|
data
|
|
that is kept in a file outside the program.
|
|
.sp
|
|
We assume a simple data structure of a
|
|
scalar type (integer, floating point number)
|
|
has some known external representation.
|
|
An
|
|
.UL array
|
|
having elements of a scalar type can be represented
|
|
externally easily, by successively
|
|
representing its elements.
|
|
The external representation may be preceded by a
|
|
number, giving the length of the array.
|
|
Now, consider a linear, singly linked list,
|
|
the elements of which look like:
|
|
.DS
|
|
record
|
|
data: scalar_type;
|
|
next: pointer_type;
|
|
end;
|
|
.DE
|
|
It is significant to note that the "next"
|
|
fields of the elements only have a meaning within
|
|
main memory.
|
|
The field contains the address of some location in
|
|
main memory.
|
|
If a list element is written to a file in
|
|
some program,
|
|
and read by another program,
|
|
the element will be allocated at a different
|
|
address in main memory.
|
|
Hence this address value is completely
|
|
useless outside the program.
|
|
.sp
|
|
One may represent the list by ignoring these "next" fields
|
|
and storing the data items in the order they are linked.
|
|
The "next" fields are represented \fIimplicitly\fR.
|
|
When the file is read again,
|
|
the same list can be reconstructed.
|
|
In order to know where the external representation of the
|
|
list ends,
|
|
it may be useful to put the length of
|
|
the list in front of it.
|
|
.sp
|
|
Note that arrays and linear lists have the
|
|
same external representation.
|
|
.PP
|
|
A doubly linked, linear list,
|
|
with elements of the type:
|
|
.DS
|
|
record
|
|
data: scalar_type;
|
|
next,
|
|
previous: pointer_type;
|
|
end
|
|
.DE
|
|
can be represented in precisely the same way.
|
|
Both the "next" and the "previous" fields are represented
|
|
implicitly.
|
|
.PP
|
|
Next, consider a binary tree,
|
|
the nodes of which have type:
|
|
.DS
|
|
record
|
|
data: scalar_type;
|
|
left,
|
|
right: pointer_type;
|
|
end
|
|
.DE
|
|
Such a tree can be represented sequentially,
|
|
by storing its nodes in some fixed order, e.g. prefix order.
|
|
A special null data item may be used to
|
|
denote a missing left or right son.
|
|
For example, let the scalar type be integer,
|
|
and let the null item be 0.
|
|
Then the tree of fig. 3.1(a)
|
|
can be represented as in fig. 3.1(b).
|
|
.DS
|
|
.ft 5
|
|
4
|
|
/ \e
|
|
9 12
|
|
/ \e / \e
|
|
12 3 4 6
|
|
/ \e \e /
|
|
8 1 5 1
|
|
.ft R
|
|
|
|
Fig. 3.1(a) A binary tree
|
|
|
|
|
|
.ft 5
|
|
4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
|
|
.ft R
|
|
|
|
Fig. 3.1(b) Its sequential representation
|
|
.DE
|
|
We are still able to represent the pointer fields ("left"
|
|
and "right") implicitly.
|
|
.PP
|
|
Finally, consider a general
|
|
.UL graph
|
|
, where each node has a "data" field and
|
|
pointer fields,
|
|
with no restriction on where they may point to.
|
|
Now we're at the end of our tale.
|
|
There is no way to represent the pointers implicitly,
|
|
like we did with lists and trees.
|
|
In order to represent them explicitly,
|
|
we use the following scheme.
|
|
Every node gets an extra field,
|
|
containing some unique number that identifies the node.
|
|
We call this number its
|
|
.UL id.
|
|
A pointer is represented externally as the id of the node
|
|
it points to.
|
|
When reading the file we use a table that maps
|
|
an id to the address of its node.
|
|
In general this table will not be completely filled in
|
|
until we have read the entire external representation of
|
|
the graph and allocated internal memory locations for
|
|
every node.
|
|
Hence we cannot reconstruct the graph in one scan.
|
|
That is, there may be some pointers from node A to B,
|
|
where B is placed after A in the sequential file than A.
|
|
When we read the node of A we cannot map the id of B
|
|
to the address of node B,
|
|
as we have not yet allocated node B.
|
|
We can overcome this problem if the size
|
|
of every node is known in advance.
|
|
In this case we can allocate memory for a node
|
|
on first reference.
|
|
Else, the mapping from id to pointer
|
|
cannot be done while reading nodes.
|
|
The mapping can be done either in an extra scan
|
|
or at every reference to the node.
|