146 lines
		
	
	
	
		
			4.1 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			146 lines
		
	
	
	
		
			4.1 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .NH 2
 | |
| Representation of complex data structures in a sequential file
 | |
| .PP
 | |
| Most programmers are quite used to deal with
 | |
| complex data structures, such as
 | |
| arrays, graphs and trees.
 | |
| There are some particular problems that occur
 | |
| when storing such a data structure
 | |
| in a sequential file.
 | |
| We call data that is kept in
 | |
| main memory
 | |
| .UL internal
 | |
| ,as opposed to
 | |
| .UL external
 | |
| data
 | |
| that is kept in a file outside the program.
 | |
| .sp
 | |
| We assume a simple data structure of a
 | |
| scalar type (integer, floating point number)
 | |
| has some known external representation.
 | |
| An
 | |
| .UL array
 | |
| having elements of a scalar type can be represented
 | |
| externally easily, by successively
 | |
| representing its elements.
 | |
| The external representation may be preceded by a
 | |
| number, giving the length of the array.
 | |
| Now, consider a linear, singly linked list,
 | |
| the elements of which look like:
 | |
| .DS
 | |
| record
 | |
|         data: scalar_type;
 | |
|         next: pointer_type;
 | |
| end;
 | |
| .DE
 | |
| It is significant to note that the "next"
 | |
| fields of the elements only have a meaning within
 | |
| main memory.
 | |
| The field contains the address of some location in
 | |
| main memory.
 | |
| If a list element is written to a file in
 | |
| some program,
 | |
| and read by another program,
 | |
| the element will be allocated at a different
 | |
| address in main memory.
 | |
| Hence this address value is completely
 | |
| useless outside the program.
 | |
| .sp
 | |
| One may represent the list by ignoring these "next" fields
 | |
| and storing the data items in the order they are linked.
 | |
| The "next" fields are represented \fIimplicitly\fR.
 | |
| When the file is read again,
 | |
| the same list can be reconstructed.
 | |
| In order to know where the external representation of the
 | |
| list ends,
 | |
| it may be useful to put the length of
 | |
| the list in front of it.
 | |
| .sp
 | |
| Note that arrays and linear lists have the
 | |
| same external representation.
 | |
| .PP
 | |
| A doubly linked, linear list,
 | |
| with elements of the type:
 | |
| .DS
 | |
| record
 | |
|         data: scalar_type;
 | |
|         next,
 | |
|         previous: pointer_type;
 | |
| end
 | |
| .DE
 | |
| can be represented in precisely the same way.
 | |
| Both the "next" and the "previous" fields are represented
 | |
| implicitly.
 | |
| .PP
 | |
| Next, consider a binary tree,
 | |
| the nodes of which have type:
 | |
| .DS
 | |
| record
 | |
|         data: scalar_type;
 | |
|         left,
 | |
|         right: pointer_type;
 | |
| end
 | |
| .DE
 | |
| Such a tree can be represented sequentially,
 | |
| by storing its nodes in some fixed order, e.g. prefix order.
 | |
| A special null data item may be used to
 | |
| denote a missing left or right son.
 | |
| For example, let the scalar type be integer,
 | |
| and let the null item be 0.
 | |
| Then the tree of fig. 3.1(a)
 | |
| can be represented as in fig. 3.1(b).
 | |
| .DS
 | |
|                         4
 | |
| 
 | |
|                     9      12
 | |
| 
 | |
|                 12    3   4   6
 | |
| 
 | |
|                      8  1  5 1
 | |
| 
 | |
| Fig. 3.1(a) A binary tree
 | |
| 
 | |
| 
 | |
| 4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
 | |
| 
 | |
| Fig. 3.1(b) Its sequential representation
 | |
| .DE
 | |
| We are still able to represent the pointer fields ("left"
 | |
| and "right") implicitly.
 | |
| .PP
 | |
| Finally, consider a general
 | |
| .UL graph
 | |
| , where each node has a "data" field and
 | |
| pointer fields,
 | |
| with no restriction on where they may point to.
 | |
| Now we're at the end of our tale.
 | |
| There is no way to represent the pointers implicitly,
 | |
| like we did with lists and trees.
 | |
| In order to represent them explicitly,
 | |
| we use the following scheme.
 | |
| Every node gets an extra field,
 | |
| containing some unique number that identifies the node.
 | |
| We call this number its
 | |
| .UL id.
 | |
| A pointer is represented externally as the id of the node
 | |
| it points to.
 | |
| When reading the file we use a table that maps
 | |
| an id to the address of its node.
 | |
| In general this table will not be completely filled in
 | |
| until we have read the entire external representation of
 | |
| the graph and allocated internal memory locations for
 | |
| every node.
 | |
| Hence we cannot reconstruct the graph in one scan.
 | |
| That is, there may be some pointers from node A to B,
 | |
| where B is placed after A in the sequential file than A.
 | |
| When we read the node of A we cannot map the id of B
 | |
| to the address of node B,
 | |
| as we have not yet allocated node B.
 | |
| We can overcome this problem if the size
 | |
| of every node is known in advance.
 | |
| In this case we can allocate memory for a node
 | |
| on first reference.
 | |
| Else, the mapping from id to pointer
 | |
| cannot be done while reading nodes.
 | |
| The mapping can be done either in an extra scan
 | |
| or at every reference to the node.
 |