When developing a fast compiler for the Sun-4 series we have encountered
rather strange behavior of the Sun kernel.

The problem is that when you have lots of nested procedure calls, (as
is often the case in compilers and parsers) the registers fill up which
causes a kernel trap. The kernel will then write out some of the registers
to memory to make room for another window. When you return from the nested
procedure call, just the reverse happens: yet another kernel trap so the
kernel can load the register from memory.

Unfortunately the kernel only saves or loads a single window (= 16 register)
on each trap. This means that when you call a procedure recursively it causes
a kernel trap on almost every invocation (except for the first few).

To illustrate this consider the following little program:

--------------- little program -------------
f(i)	/* calls itself i times */
int i;
{
  if (i)
	f(i-1);
}

main(argc, argv)
int argc;
char *argv[];
{


  i = atoi(argv[1]);	/* # loops */
  j = atoi(argv[2]);	/* depth */

  while (i--)
	f(j);
}
------------ end of little program -----------


The performance decreases abruptly when the depth (j) becomes larger
than 5. On a SPARC station we got the following results:

	depth	run time (in seconds)

	1	 0.5
	2	 0.8
	3	 1.0
	4	 1.4	<- from here on it's +6 seconds for each
	5	 7.6		step deeper.
	6	13.9
	7	19.9
	8	26.3
	9	32.9

Things would be a lot better when instead of just 1, the kernel would
save or restore 4 windows (= 64 registers = 50% on our SPARC stations).

	-Raymond.