.so init .nr H1 4 .NH FUTURE WORK .NH 2 A critique of EM .PP In general, EM fits its purpose quite well. Numerous compilers have been written using EM as their intermediate language and it has even become a commercial product. A great deal of its success is probably due to its simplicity. There are no extravagant instructions but it does have all the necessary functions to write a decent compiler. .PP There are, however, a few functions that come rather close to being extravagant. The \*(Silar\*(So function for example \(em used to fetch an element from an array \(em does not make it much easier to write a frontend, but does make it unnecessary hard to write an efficient backend. Other instructions for which it is difficult to generate efficient code for are those that permit dynamic operators, such as the \*(Silos\*(So. Dynamic operators, however, provide significant extra possibilities and can therefore not be disposed of. Note that even though the array operations \*(Silar\*(So and \*(Sisar\*(So provide dynamic operators, they do not add additional power, since they can easily be replaced with a sequence using the \*(Silos\*(So or \*(Sists\*(So instructions. .PP EM code to reference arrays generated by the C frontend can be translated very efficiently for almost any processor. However the same operation generated by the Modula-2 frontend (which uses the \*(Silar\*(So), is much less efficient, although the only difference is that the latter performs range checking whereas the former does not.\(dg .FS \(dg Actually this depends on whether or not explicit range checking in enabled. This clearly shows that the current code generators are not optimal and often depend on ad-hoc decisions. .FE Since range checking can also be expressed explicitly in EM (\*(Sirck\*(So) there is no need for any of the array operations (\*(Siaar\*(So, \*(Silar\*(So and \*(Sisar\*(So). .PP Besides efficiency of the array-operations themselves, there still is another major disadvantage of using these array-operations. In sharp contrast to all other EM instructions except the \*(Silos\*(So and the \*(Sists\*(So, they allow dynamic operators, so their effect on the stack-pointer can not always be determined at compile-time. This means that efficient caching of the top-of-stack in registers is almost impossible, so using these array-operations also effects the efficiency of the surrounding code. Now that processors are produced with more and more registers it could be very beneficiary to cache the top-of-stack, so that the memory/register reference ratio decreases to the benefit of the overall performance. .PP As a final critique, we would also like to discuss the semantics of some of the EM instructions. In .[ [ Description of a Machine Architecture .]] it is said that all signed instructions such as the \*(Siadi\*(So, should cause an exception on overflow. The unsigned operations such as \*(Siadu\*(So, however, should act as modulo operations and therefor not perform overflow checking. Since it is very expensive to perform overflow checking in EM, we would suggest that the backend takes care of this. For languages which do not require overflow checking, a simple message could be generated to disable overflow checking in backends. This way all backends could be written to fully comply to the official EM definition without any reduction in efficiency.\(dd .FS \(dd Currently many backends do not implement error checks because they are too expensive and almost never needed. Some frontends even have facilities build in to generate EM-code to force these checks. If this trend continues we will end up with a de-facto and a de-jure standard both developed by the same people but nonetheless incompatible. .FE When such messages will be added we would like to suggest that they can enforce overflow checks on unsigned, as well as signed arithmetic. .PP As a conclusion we would like to suggest removal of the array operations from EM, or at least discontinuation of there usage in frontends. .NH 2 \*(OQWanted: Procedure call information\*(CQ .PP The advantage of an intermediate language such as EM is that the backend no longer has to know about any 'quirks' of the 'input'-language. The major disadvantage, however, is that the backend no longer knows about any 'quirks' of the 'input'-language... If the SPARC backend ever has to compete with Sun's own C-compiler for example, removal of the array-operations will not be enough. The amount of information that is lost during the translation to EM is too large to ever generate truly efficient SPARC code. .PP To write such an efficient backend one needs to know, for example, whether, when and what type of parameter is being computed, so the result can be stored in the proper place and scratch registers can be reused. (On the SPARC processor, for example, it is very beneficiary to pass the first six parameters of a procedure call through registers instead of using the stack.) One way to express such things in EM is to insert extra messages in the EM-code. The C statement \*(Sia = f(4, a + b);\*(So for example, could be translated to the following EM-code: .DS .TS ; l1f6 lf6 l. lol -4 ! a lol -8 ! b mes x, 2 ! next instruction will compute 2nd parameter adi 4 mes x, 1 ! next instruction will compute 1st parameter loc 4 cal _f ! call function f lfr 4 stl -4 ! store result in a .TE .DE For a code expander it is important that the \*(Simes\*(So pseudo instructions appear \fIbefore\fR the EM instruction that computes the parameter, because that way the final computation (the \*(Siadi\*(So and \*(Siloc\*(So in the previous example) can be translated to machine code that performs the required computation and also puts the result in the required place. If it is found to be too difficult for the frontend to insert these \*(Simes\*(So instructions at the right place the peep-hole optimizer might swap the \*(Simes\*(So and the instruction that computes the parameter. .PP For some architectures, it is also possible to generate more efficient code for a procedure when it is a so-called leaf-procedure: a procedure that doesn't call other procedures. On the SPARC, for example, it is not necessary to rotate the register window for a call to a leaf procedure and it is also possible to use the global registers for register variables in leaf procedures. It will be a little harder to insert useful messages about leaf procedures, because just as with register messages, they are only useful to the backend when they appear immediately after or before the \*(Sipro\*(So pseudo instruction. The frontend, however, only knows whether a certain procedure is a leaf-procedure or not when it has already generated the entire procedure in EM. Just as with the \*(Sipro ? / end n\*(So-dilemma the peep-hole optimizer .[ [ Using Peephole Optimization .]] might be able to lend a hand and help us out by delaying EM-code generation until it has reached the end of the procedure. .PP As with most optimizations, the main problem is that they have to be implemented with the \*(Simes\*(So pseudo instruction. Because the \*(Simes\*(So instruction can have many different meanings depending on its argument, it is important that all optimizers recognize and respect them. Addition of even a single message will require careful inspection of, and maybe even incorporate small changes to each of the optimizers. .bp