converted to texinfo - added developper's guide
This commit is contained in:
parent
3d3e2372c5
commit
30766d3cd1
1 changed files with 533 additions and 178 deletions
711
tcc-doc.texi
711
tcc-doc.texi
|
@ -1,52 +1,97 @@
|
|||
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<title> Tiny C Compiler Reference Documentation </title>
|
||||
</head>
|
||||
<body>
|
||||
\input texinfo @c -*- texinfo -*-
|
||||
|
||||
<center>
|
||||
<h1>
|
||||
Tiny C Compiler Reference Documentation
|
||||
</h1>
|
||||
</center>
|
||||
@settitle Tiny C Compiler Reference Documentation
|
||||
@titlepage
|
||||
@sp 7
|
||||
@center @titlefont{Tiny C Compiler Reference Documentation}
|
||||
@sp 3
|
||||
@end titlepage
|
||||
|
||||
<h2>Introduction</h2>
|
||||
@chapter Introduction
|
||||
|
||||
TinyCC (aka TCC) is a small but hyper fast C compiler. Unlike other C
|
||||
compilers, it is meant to be self-suffisant: you do not need an
|
||||
external assembler or linker because TCC does that for you.
|
||||
<P>
|
||||
TCC compiles so <em>fast</em> that even for big projects <tt>Makefile</tt>s may
|
||||
|
||||
TCC compiles so @emph{fast} that even for big projects @code{Makefile}s may
|
||||
not be necessary.
|
||||
<P>
|
||||
|
||||
TCC not only supports ANSI C, but also most of the new ISO C99
|
||||
standard and many GNUC extensions.
|
||||
<P>
|
||||
TCC can also be used to make <I>C scripts</I>,
|
||||
i.e. pieces of C source that you run as a Perl or Python
|
||||
script. Compilation is so fast that your script will be as fast as if
|
||||
it was an executable.
|
||||
<P>
|
||||
TCC can also automatically generate <A HREF="#bounds">memory and bound
|
||||
checks</A> while allowing all C pointers operations. TCC can do these
|
||||
checks even if non patched libraries are used.
|
||||
</P>
|
||||
|
||||
<h2>Full ANSI C support</h2>
|
||||
TCC can also be used to make @emph{C scripts}, i.e. pieces of C source
|
||||
that you run as a Perl or Python script. Compilation is so fast that
|
||||
your script will be as fast as if it was an executable.
|
||||
|
||||
TCC can also automatically generate memory and bound checks
|
||||
(@xref{bounds}) while allowing all C pointers operations. TCC can do
|
||||
these checks even if non patched libraries are used.
|
||||
|
||||
With @code{libtcc}, you can use TCC as a backend for dynamic code
|
||||
generation (@xref{libtcc}).
|
||||
|
||||
@node invoke
|
||||
@chapter Command line invocation
|
||||
|
||||
@example
|
||||
usage: tcc [-Idir] [-Dsym[=val]] [-Usym] [-llib] [-g] [-b]
|
||||
[-i infile] infile [infile_args...]
|
||||
@end example
|
||||
|
||||
@table @samp
|
||||
@item -Idir
|
||||
Specify an additionnal include path. The default ones are:
|
||||
@file{/usr/include}, @code{prefix}@file{/lib/tcc/include} (@code{prefix}
|
||||
is usually @file{/usr} or @file{/usr/local}).
|
||||
|
||||
@item -Dsym[=val]
|
||||
Define preprocessor symbol 'sym' to
|
||||
val. If val is not present, its value is '1'. Function-like macros can
|
||||
also be defined: @code{'-DF(a)=a+1'}
|
||||
|
||||
@item -Usym
|
||||
Undefine preprocessor symbol 'sym'.
|
||||
|
||||
@item -lxxx
|
||||
Dynamically link your program with library
|
||||
libxxx.so. Standard library paths are checked, including those
|
||||
specified with LD_LIBRARY_PATH.
|
||||
|
||||
@item -g
|
||||
Generate run time debug information so that you get clear run time
|
||||
error messages: @code{ test.c:68: in function 'test5()': dereferencing
|
||||
invalid pointer} instead of the laconic @code{Segmentation
|
||||
fault}.
|
||||
|
||||
@item -b
|
||||
Generate additionnal support code to check
|
||||
memory allocations and array/pointer bounds. '-g' is implied. Note
|
||||
that the generated code is slower and bigger in this case.
|
||||
|
||||
@item -i file
|
||||
Compile C source 'file' before main C source. With this
|
||||
command, multiple C files can be compiled and linked together.
|
||||
|
||||
@end table
|
||||
|
||||
Note: the @code{-o file} option to generate an ELF executable is
|
||||
currently unsupported.
|
||||
|
||||
@chapter C language support
|
||||
|
||||
@section ANSI C
|
||||
|
||||
TCC implements all the ANSI C standard, including structure bit fields
|
||||
and floating point numbers (<tt>long double</tt>, <tt>double</tt>, and
|
||||
<tt>float</tt> fully supported). The following limitations are known:
|
||||
and floating point numbers (@code{long double}, @code{double}, and
|
||||
@code{float} fully supported). The following limitations are known:
|
||||
|
||||
<ul>
|
||||
<li> The preprocessor tokens are the same as C. It means that in some
|
||||
@itemize
|
||||
@item The preprocessor tokens are the same as C. It means that in some
|
||||
rare cases, preprocessed numbers are not handled exactly as in ANSI
|
||||
C. This approach has the advantage of being simpler and FAST!
|
||||
</ul>
|
||||
@end itemize
|
||||
|
||||
<h2>ISOC99 extensions</h2>
|
||||
@section ISOC99 extensions
|
||||
|
||||
TCC implements many features of the new C standard: ISO C99. Currently
|
||||
missing items are: complex and imaginary numbers and variable length
|
||||
|
@ -54,77 +99,77 @@ arrays.
|
|||
|
||||
Currently implemented ISOC99 features:
|
||||
|
||||
<ul>
|
||||
@itemize
|
||||
|
||||
<li> 64 bit <tt>'long long'</tt> types are fully supported.
|
||||
@item 64 bit @code{'long long'} types are fully supported.
|
||||
|
||||
<li> The boolean type <tt>'_Bool'</tt> is supported.
|
||||
@item The boolean type @code{'_Bool'} is supported.
|
||||
|
||||
<li> <tt>'__func__'</tt> is a string variable containing the current
|
||||
@item @code{'__func__'} is a string variable containing the current
|
||||
function name.
|
||||
|
||||
<li> Variadic macros: <tt>__VA_ARGS__</tt> can be used for
|
||||
@item Variadic macros: @code{__VA_ARGS__} can be used for
|
||||
function-like macros:
|
||||
<PRE>
|
||||
@example
|
||||
#define dprintf(level, __VA_ARGS__) printf(__VA_ARGS__)
|
||||
</PRE>
|
||||
<tt>dprintf</tt> can then be used with a variable number of parameters.
|
||||
@end example
|
||||
@code{dprintf} can then be used with a variable number of parameters.
|
||||
|
||||
<li> Declarations can appear anywhere in a block (as in C++).
|
||||
@item Declarations can appear anywhere in a block (as in C++).
|
||||
|
||||
<li> Array and struct/union elements can be initialized in any order by
|
||||
@item Array and struct/union elements can be initialized in any order by
|
||||
using designators:
|
||||
<PRE>
|
||||
@example
|
||||
struct { int x, y; } st[10] = { [0].x = 1, [0].y = 2 };
|
||||
|
||||
int tab[10] = { 1, 2, [5] = 5, [9] = 9};
|
||||
</PRE>
|
||||
@end example
|
||||
|
||||
<li> Compound initializers are supported:
|
||||
<PRE>
|
||||
@item Compound initializers are supported:
|
||||
@example
|
||||
int *p = (int []){ 1, 2, 3 };
|
||||
</PRE>
|
||||
@end example
|
||||
to initialize a pointer pointing to an initialized array. The same
|
||||
works for structures and strings.
|
||||
|
||||
<li> Hexadecimal floating point constants are supported:
|
||||
<PRE>
|
||||
@item Hexadecimal floating point constants are supported:
|
||||
@example
|
||||
double d = 0x1234p10;
|
||||
</PRE>
|
||||
@end example
|
||||
is the same as writing
|
||||
<PRE>
|
||||
@example
|
||||
double d = 4771840.0;
|
||||
</PRE>
|
||||
@end example
|
||||
|
||||
<li> <tt>'inline'</tt> keyword is ignored.
|
||||
@item @code{'inline'} keyword is ignored.
|
||||
|
||||
<li> <tt>'restrict'</tt> keyword is ignored.
|
||||
</ul>
|
||||
@item @code{'restrict'} keyword is ignored.
|
||||
@end itemize
|
||||
|
||||
<h2>GNU C extensions</h2>
|
||||
@section GNU C extensions
|
||||
|
||||
TCC implements some GNU C extensions:
|
||||
|
||||
<ul>
|
||||
@itemize
|
||||
|
||||
<li> array designators can be used without '=':
|
||||
<PRE>
|
||||
@item array designators can be used without '=':
|
||||
@example
|
||||
int a[10] = { [0] 1, [5] 2, 3, 4 };
|
||||
</PRE>
|
||||
@end example
|
||||
|
||||
<li> Structure field designators can be a label:
|
||||
<PRE>
|
||||
@item Structure field designators can be a label:
|
||||
@example
|
||||
struct { int x, y; } st = { x: 1, y: 1};
|
||||
</PRE>
|
||||
@end example
|
||||
instead of
|
||||
<PRE>
|
||||
@example
|
||||
struct { int x, y; } st = { .x = 1, .y = 1};
|
||||
</PRE>
|
||||
@end example
|
||||
|
||||
<li> <tt>'\e'</tt> is ASCII character 27.
|
||||
@item @code{'\e'} is ASCII character 27.
|
||||
|
||||
<li> case ranges : ranges can be used in <tt>case</tt>s:
|
||||
<PRE>
|
||||
@item case ranges : ranges can be used in @code{case}s:
|
||||
@example
|
||||
switch(a) {
|
||||
case 1 ... 9:
|
||||
printf("range 1 to 9\n");
|
||||
|
@ -133,104 +178,104 @@ instead of
|
|||
printf("unexpected\n");
|
||||
break;
|
||||
}
|
||||
</PRE>
|
||||
@end example
|
||||
|
||||
<li> The keyword <tt>__attribute__</tt> is handled to specify variable or
|
||||
@item The keyword @code{__attribute__} is handled to specify variable or
|
||||
function attributes. The following attributes are supported:
|
||||
<ul>
|
||||
<li> <tt>aligned(n)</tt>: align data to n bytes (must be a power of two).
|
||||
@itemize
|
||||
@item @code{aligned(n)}: align data to n bytes (must be a power of two).
|
||||
|
||||
<li> <tt>section(name)</tt>: generate function or data in assembly
|
||||
@item @code{section(name)}: generate function or data in assembly
|
||||
section name (name is a string containing the section name) instead
|
||||
of the default section.
|
||||
|
||||
<li> <tt>unused</tt>: specify that the variable or the function is unused.
|
||||
@item @code{unused}: specify that the variable or the function is unused.
|
||||
|
||||
<li> <tt>cdecl</tt>: use standard C calling convention.
|
||||
@item @code{cdecl}: use standard C calling convention.
|
||||
|
||||
<li> <tt>stdcall</tt>: use Pascal-like calling convention.
|
||||
@item @code{stdcall}: use Pascal-like calling convention.
|
||||
|
||||
@end itemize
|
||||
|
||||
</ul>
|
||||
<BR>
|
||||
Here are some examples:
|
||||
<PRE>
|
||||
@example
|
||||
int a __attribute__ ((aligned(8), section(".mysection")));
|
||||
</PRE>
|
||||
<BR>
|
||||
align variable <tt>'a'</tt> to 8 bytes and put it in section <tt>.mysection</tt>.
|
||||
@end example
|
||||
|
||||
<PRE>
|
||||
align variable @code{'a'} to 8 bytes and put it in section @code{.mysection}.
|
||||
|
||||
@example
|
||||
int my_add(int a, int b) __attribute__ ((section(".mycodesection")))
|
||||
{
|
||||
return a + b;
|
||||
}
|
||||
</PRE>
|
||||
<BR>
|
||||
generate function <tt>'my_add'</tt> in section <tt>.mycodesection</tt>.
|
||||
</ul>
|
||||
@end example
|
||||
|
||||
<h2>TinyCC extensions</h2>
|
||||
generate function @code{'my_add'} in section @code{.mycodesection}.
|
||||
|
||||
I have added some extensions I find interesting:
|
||||
@item GNU style variadic macros:
|
||||
@example
|
||||
#define dprintf(fmt, args...) printf(fmt, ## args)
|
||||
|
||||
<ul>
|
||||
dprintf("no arg\n");
|
||||
dprintf("one arg %d\n", 1);
|
||||
@end example
|
||||
|
||||
<li> <tt>__TINYC__</tt> is a predefined macro to <tt>'1'</tt> to
|
||||
@end itemize
|
||||
|
||||
@section TinyCC extensions
|
||||
|
||||
@itemize
|
||||
|
||||
@item @code{__TINYC__} is a predefined macro to @code{'1'} to
|
||||
indicate that you use TCC.
|
||||
|
||||
<li> <tt>'#!'</tt> at the start of a line is ignored to allow scripting.
|
||||
@item @code{'#!'} at the start of a line is ignored to allow scripting.
|
||||
|
||||
<li> Binary digits can be entered (<tt>'0b101'</tt> instead of
|
||||
<tt>'5'</tt>).
|
||||
@item Binary digits can be entered (@code{'0b101'} instead of
|
||||
@code{'5'}).
|
||||
|
||||
<li> <tt>__BOUNDS_CHECKING_ON</tt> is defined if bound checking is activated.
|
||||
@item @code{__BOUNDS_CHECKING_ON} is defined if bound checking is activated.
|
||||
|
||||
</ul>
|
||||
@end itemize
|
||||
|
||||
<h2>TinyCC Memory and Bound checks</h2>
|
||||
<A NAME="bounds"></a>
|
||||
@node bounds
|
||||
@chapter TinyCC Memory and Bound checks
|
||||
|
||||
This feature is activated with the <A HREF="#invoke"><tt>'-b'</tt>
|
||||
option</A>.
|
||||
<P>
|
||||
Note that pointer size is <em>unchanged</em> and that code generated
|
||||
with bound checks is <em>fully compatible</em> with unchecked
|
||||
This feature is activated with the @code{'-b'} (@xref{invoke}).
|
||||
|
||||
Note that pointer size is @emph{unchanged} and that code generated
|
||||
with bound checks is @emph{fully compatible} with unchecked
|
||||
code. When a pointer comes from unchecked code, it is assumed to be
|
||||
valid. Even very obscure C code with casts should work correctly.
|
||||
</P>
|
||||
<P> To have more information about the ideas behind this method, <A
|
||||
HREF="http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html">check
|
||||
here</A>.
|
||||
</P>
|
||||
<P>
|
||||
|
||||
To have more information about the ideas behind this method, check at
|
||||
@url{http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html}.
|
||||
|
||||
Here are some examples of catched errors:
|
||||
</P>
|
||||
<TABLE BORDER=1>
|
||||
<TR>
|
||||
<TD>
|
||||
<PRE>
|
||||
|
||||
@table @asis
|
||||
|
||||
@item Invalid range with standard string function:
|
||||
@example
|
||||
{
|
||||
char tab[10];
|
||||
memset(tab, 0, 11);
|
||||
}
|
||||
</PRE>
|
||||
</TD><TD VALIGN=TOP>Invalid range with standard string function</TD>
|
||||
@end example
|
||||
|
||||
<TR>
|
||||
<TD>
|
||||
<PRE>
|
||||
@item Bound error in global or local arrays:
|
||||
@example
|
||||
{
|
||||
int tab[10];
|
||||
for(i=0;i<11;i++) {
|
||||
sum += tab[i];
|
||||
}
|
||||
}
|
||||
</PRE>
|
||||
</TD><TD VALIGN=TOP>Bound error in global or local arrays</TD>
|
||||
@end example
|
||||
|
||||
<TR>
|
||||
<TD>
|
||||
<PRE>
|
||||
@item Bound error in allocated data:
|
||||
@example
|
||||
{
|
||||
int *tab;
|
||||
tab = malloc(20 * sizeof(int));
|
||||
|
@ -239,12 +284,10 @@ Here are some examples of catched errors:
|
|||
}
|
||||
free(tab);
|
||||
}
|
||||
</PRE>
|
||||
</TD><TD VALIGN=TOP>Bound error in allocated data</TD>
|
||||
@end example
|
||||
|
||||
<TR>
|
||||
<TD>
|
||||
<PRE>
|
||||
@item Access to a freed region:
|
||||
@example
|
||||
{
|
||||
int *tab;
|
||||
tab = malloc(20 * sizeof(int));
|
||||
|
@ -253,70 +296,382 @@ Here are some examples of catched errors:
|
|||
sum += tab4[i];
|
||||
}
|
||||
}
|
||||
</PRE>
|
||||
</TD><TD VALIGN=TOP>Access to a freed region</TD>
|
||||
@end example
|
||||
|
||||
<TR>
|
||||
<TD>
|
||||
<PRE>
|
||||
@item Freeing an already freed region:
|
||||
@example
|
||||
{
|
||||
int *tab;
|
||||
tab = malloc(20 * sizeof(int));
|
||||
free(tab);
|
||||
free(tab);
|
||||
}
|
||||
</PRE>
|
||||
</TD><TD VALIGN=TOP>Freeing an already freed region</TD>
|
||||
@end example
|
||||
|
||||
</TABLE>
|
||||
@end table
|
||||
|
||||
<h2> Command line invocation </h2>
|
||||
<A NAME="invoke"></a>
|
||||
@node libtcc
|
||||
@chapter The @code{libtcc} library
|
||||
|
||||
<PRE>
|
||||
usage: tcc [-Idir] [-Dsym[=val]] [-Usym] [-llib] [-g] [-b]
|
||||
[-i infile] infile [infile_args...]
|
||||
</PRE>
|
||||
The @code{libtcc} library enables you to use TCC as a backend for
|
||||
dynamic code generation.
|
||||
|
||||
<table>
|
||||
<tr><td>'-Idir'</td>
|
||||
<td>Specify an additionnal include path. The default ones are:
|
||||
/usr/include, /usr/lib/tcc, /usr/local/lib/tcc.</td>
|
||||
Read the @file{libtcc.h} to have an overview of the API. Read
|
||||
@file{libtcc_test.c} to have a very simple example.
|
||||
|
||||
<tr><td>'-Dsym[=val]'</td> <td>Define preprocessor symbol 'sym' to
|
||||
val. If val is not present, its value is '1'. Function-like macros can
|
||||
also be defined: <tt>'-DF(a)=a+1'</tt></td>
|
||||
The idea consists in giving a C string containing the program you want
|
||||
to compile directly to @code{libtcc}. Then the @code{main()} function of
|
||||
the compiled string can be launched.
|
||||
|
||||
<tr><td>'-Usym'</td> <td>Undefine preprocessor symbol 'sym'.</td>
|
||||
@chapter Developper's guide
|
||||
|
||||
<tr><td>'-lxxx'</td>
|
||||
<td>Dynamically link your program with library
|
||||
libxxx.so. Standard library paths are checked, including those
|
||||
specified with LD_LIBRARY_PATH.</td>
|
||||
This chapter gives some hints to understand how TCC works. You can skip
|
||||
it if you do not intend to modify the TCC code.
|
||||
|
||||
<tr><td>'-g'</td>
|
||||
<td>Generate run time debug information so that you get clear run time
|
||||
error messages: <tt> test.c:68: in function 'test5()': dereferencing
|
||||
invalid pointer</tt> instead of the laconic <tt>Segmentation
|
||||
fault</tt>.
|
||||
</td>
|
||||
@section File reading
|
||||
|
||||
<tr><td>'-b'</td> <td>Generate additionnal support code to check
|
||||
memory allocations and array/pointer bounds. '-g' is implied. Note
|
||||
that the generated code is slower and bigger in this case.
|
||||
</td>
|
||||
The @code{BufferedFile} structure contains the context needed to read a
|
||||
file, including the current line number. @code{tcc_open()} opens a new
|
||||
file and @code{tcc_close()} closes it. @code{inp()} returns the next
|
||||
character.
|
||||
|
||||
<tr><td>'-i file'</td>
|
||||
<td>Compile C source 'file' before main C source. With this
|
||||
command, multiple C files can be compiled and linked together.</td>
|
||||
</table>
|
||||
<br>
|
||||
Note: the <tt>'-o file'</tt> option to generate an ELF executable is
|
||||
currently unsupported.
|
||||
@section Lexer
|
||||
|
||||
<hr>
|
||||
Copyright (c) 2001, 2002 Fabrice Bellard <hr>
|
||||
Fabrice Bellard - <em> fabrice.bellard at free.fr </em> - <A HREF="http://bellard.org/"> http://bellard.org/ </A> - <A HREF="http://www.tinycc.org/"> http://www.tinycc.org/ </A>
|
||||
@code{next()} reads the next token in the current
|
||||
file. @code{next_nomacro()} reads the next token without macro
|
||||
expansion.
|
||||
|
||||
@code{tok} contains the current token (see @code{TOK_xxx})
|
||||
constants. Identifiers and keywords are also keywords. @code{tokc}
|
||||
contains additionnal infos about the token (for example a constant value
|
||||
if number or string token).
|
||||
|
||||
@section Parser
|
||||
|
||||
The parser is hardcoded (yacc is not necessary). It does only one pass,
|
||||
except:
|
||||
|
||||
@itemize
|
||||
|
||||
@item For initialized arrays with unknown size, a first pass
|
||||
is done to count the number of elements.
|
||||
|
||||
@item For architectures where arguments are evaluated in
|
||||
reverse order, a first pass is done to reverse the argument order.
|
||||
|
||||
@end itemize
|
||||
|
||||
@section Types
|
||||
|
||||
The types are stored in a single 'int' variable. It was choosen in the
|
||||
first stages of development when tcc was much simpler. Now, it may not
|
||||
be the best solution.
|
||||
|
||||
@example
|
||||
#define VT_INT 0 /* integer type */
|
||||
#define VT_BYTE 1 /* signed byte type */
|
||||
#define VT_SHORT 2 /* short type */
|
||||
#define VT_VOID 3 /* void type */
|
||||
#define VT_PTR 4 /* pointer */
|
||||
#define VT_ENUM 5 /* enum definition */
|
||||
#define VT_FUNC 6 /* function type */
|
||||
#define VT_STRUCT 7 /* struct/union definition */
|
||||
#define VT_FLOAT 8 /* IEEE float */
|
||||
#define VT_DOUBLE 9 /* IEEE double */
|
||||
#define VT_LDOUBLE 10 /* IEEE long double */
|
||||
#define VT_BOOL 11 /* ISOC99 boolean type */
|
||||
#define VT_LLONG 12 /* 64 bit integer */
|
||||
#define VT_LONG 13 /* long integer (NEVER USED as type, only
|
||||
during parsing) */
|
||||
#define VT_BTYPE 0x000f /* mask for basic type */
|
||||
#define VT_UNSIGNED 0x0010 /* unsigned type */
|
||||
#define VT_ARRAY 0x0020 /* array type (also has VT_PTR) */
|
||||
#define VT_BITFIELD 0x0040 /* bitfield modifier */
|
||||
|
||||
#define VT_STRUCT_SHIFT 16 /* structure/enum name shift (16 bits left) */
|
||||
@end example
|
||||
|
||||
When a reference to another type is needed (for pointers, functions and
|
||||
structures), the @code{32 - VT_STRUCT_SHIFT} high order bits are used to
|
||||
store an identifier reference.
|
||||
|
||||
The @code{VT_UNSIGNED} flag can be set for chars, shorts, ints and long
|
||||
longs.
|
||||
|
||||
Arrays are considered as pointers @code{VT_PTR} with the flag
|
||||
@code{VT_ARRAY} set.
|
||||
|
||||
The @code{VT_BITFIELD} flag can be set for chars, shorts, ints and long
|
||||
longs. If it is set, then the bitfield position is stored from bits
|
||||
VT_STRUCT_SHIFT to VT_STRUCT_SHIFT + 5 and the bit field size is stored
|
||||
from bits VT_STRUCT_SHIFT + 6 to VT_STRUCT_SHIFT + 11.
|
||||
|
||||
@code{VT_LONG} is never used except during parsing.
|
||||
|
||||
During parsing, the storage of an object is also stored in the type
|
||||
integer:
|
||||
|
||||
@example
|
||||
#define VT_EXTERN 0x00000080 /* extern definition */
|
||||
#define VT_STATIC 0x00000100 /* static variable */
|
||||
#define VT_TYPEDEF 0x00000200 /* typedef definition */
|
||||
@end example
|
||||
|
||||
@section Symbols
|
||||
|
||||
All symbols are stored in hashed symbol stacks. Each symbol stack
|
||||
contains @code{Sym} structures.
|
||||
|
||||
@code{Sym.v} contains the symbol name (remember
|
||||
an idenfier is also a token, so a string is never necessary to store
|
||||
it). @code{Sym.t} gives the type of the symbol. @code{Sym.r} is usually
|
||||
the register in which the corresponding variable is stored. @code{Sym.c} is
|
||||
usually a constant associated to the symbol.
|
||||
|
||||
Four main symbol stacks are defined:
|
||||
|
||||
@table @code
|
||||
|
||||
@item define_stack
|
||||
for the macros (@code{#define}s).
|
||||
|
||||
@item global_stack
|
||||
for the global variables, functions and types.
|
||||
|
||||
@item extern_stack
|
||||
for the external symbols shared between files.
|
||||
|
||||
@item local_stack
|
||||
for the local variables, functions and types.
|
||||
|
||||
@item label_stack
|
||||
for the local labels (for @code{goto}).
|
||||
|
||||
@end table
|
||||
|
||||
@code{sym_push()} is used to add a new symbol in the local symbol
|
||||
stack. If no local symbol stack is active, it is added in the global
|
||||
symbol stack.
|
||||
|
||||
@code{sym_pop(st,b)} pops symbols from the symbol stack @var{st} until
|
||||
the symbol @var{b} is on the top of stack. If @var{b} is NULL, the stack
|
||||
is emptied.
|
||||
|
||||
@code{sym_find(v)} return the symbol associated to the identifier
|
||||
@var{v}. The local stack is searched first from top to bottom, then the
|
||||
global stack.
|
||||
|
||||
@section Sections
|
||||
|
||||
The generated code and datas are written in sections. The structure
|
||||
@code{Section} contains all the necessary information for a given
|
||||
section. @code{new_section()} creates a new section. ELF file semantics
|
||||
is assumed for each section.
|
||||
|
||||
The following sections are predefined:
|
||||
|
||||
@table @code
|
||||
|
||||
@item text_section
|
||||
is the section containing the generated code. @var{ind} contains the
|
||||
current position in the code section.
|
||||
|
||||
@item data_section
|
||||
contains initialized data
|
||||
|
||||
@item bss_section
|
||||
contains uninitialized data
|
||||
|
||||
@item bounds_section
|
||||
@itemx lbounds_section
|
||||
are used when bound checking is activated
|
||||
|
||||
@item stab_section
|
||||
@itemx stabstr_section
|
||||
are used when debugging is actived to store debug information
|
||||
|
||||
@item symtab_section
|
||||
@itemx strtab_section
|
||||
contain the exported symbols (currently only used for debugging).
|
||||
|
||||
@end table
|
||||
|
||||
@section Code generation
|
||||
|
||||
@subsection Introduction
|
||||
|
||||
The TCC code generator directly generates linked binary code in one
|
||||
pass. It is rather unusual these days (see gcc for example which
|
||||
generates text assembly), but it allows to be very fast and surprisingly
|
||||
not so complicated.
|
||||
|
||||
The TCC code generator is register based. Optimization is only done at
|
||||
the expression level. No intermediate representation of expression is
|
||||
kept except the current values stored in the @emph{value stack}.
|
||||
|
||||
On x86, three temporary registers are used. When more registers are
|
||||
needed, one register is flushed in a new local variable.
|
||||
|
||||
@subsection The value stack
|
||||
|
||||
When an expression is parsed, its value is pushed on the value stack
|
||||
(@var{vstack}). The top of the value stack is @var{vtop}. Each value
|
||||
stack entry is the structure @code{SValue}.
|
||||
|
||||
@code{SValue.t} is the type. @code{SValue.r} indicates how the value is
|
||||
currently stored in the generated code. It is usually a CPU register
|
||||
index (@code{REG_xxx} constants), but additionnal values and flags are
|
||||
defined:
|
||||
|
||||
@example
|
||||
#define VT_CONST 0x00f0 /* constant in vc
|
||||
(must be first non register value) */
|
||||
#define VT_LLOCAL 0x00f1 /* lvalue, offset on stack */
|
||||
#define VT_LOCAL 0x00f2 /* offset on stack */
|
||||
#define VT_CMP 0x00f3 /* the value is stored in processor flags (in vc) */
|
||||
#define VT_JMP 0x00f4 /* value is the consequence of jmp true (even) */
|
||||
#define VT_JMPI 0x00f5 /* value is the consequence of jmp false (odd) */
|
||||
#define VT_LVAL 0x0100 /* var is an lvalue */
|
||||
#define VT_FORWARD 0x0200 /* value is forward reference */
|
||||
#define VT_MUSTCAST 0x0400 /* value must be casted to be correct (used for
|
||||
char/short stored in integer registers) */
|
||||
#define VT_MUSTBOUND 0x0800 /* bound checking must be done before
|
||||
dereferencing value */
|
||||
#define VT_BOUNDED 0x8000 /* value is bounded. The address of the
|
||||
bounding function call point is in vc */
|
||||
#define VT_LVAL_BYTE 0x1000 /* lvalue is a byte */
|
||||
#define VT_LVAL_SHORT 0x2000 /* lvalue is a short */
|
||||
#define VT_LVAL_UNSIGNED 0x4000 /* lvalue is unsigned */
|
||||
#define VT_LVAL_TYPE (VT_LVAL_BYTE | VT_LVAL_SHORT | VT_LVAL_UNSIGNED)
|
||||
@end example
|
||||
|
||||
@table @code
|
||||
|
||||
@item VT_CONST
|
||||
indicates that the value is a constant. It is stored in the union
|
||||
@code{SValue.c}, depending on its type.
|
||||
|
||||
@item VT_LOCAL
|
||||
indicates a local variable pointer at offset @code{SValue.c.i} in the
|
||||
stack.
|
||||
|
||||
@item VT_CMP
|
||||
indicates that the value is actually stored in the CPU flags (i.e. the
|
||||
value is the consequence of a test). The value is either 0 or 1. The
|
||||
actual CPU flags used is indicated in @code{SValue.c.i}.
|
||||
|
||||
@item VT_JMP
|
||||
@itemx VT_JMPI
|
||||
indicates that the value is the consequence of a jmp. For VT_JMP, it is
|
||||
1 if the jump is taken, 0 otherwise. For VT_JMPI it is inverted.
|
||||
|
||||
These values are used to compile the @code{||} and @code{&&} logical
|
||||
operators.
|
||||
|
||||
@item VT_LVAL
|
||||
is a flag indicating that the value is actually an lvalue (left value of
|
||||
an assignment). It means that the value stored is actually a pointer to
|
||||
the wanted value.
|
||||
|
||||
Understanding the use @code{VT_LVAL} is very important if you want to
|
||||
understand how TCC works.
|
||||
|
||||
@item VT_LVAL_BYTE
|
||||
@itemx VT_LVAL_SHORT
|
||||
@itemx VT_LVAL_UNSIGNED
|
||||
if the lvalue has an integer type, then these flags give its real
|
||||
type. The type alone is not suffisant in case of cast optimisations.
|
||||
|
||||
@item VT_LLOCAL
|
||||
is a saved lvalue on the stack. @code{VT_LLOCAL} should be suppressed
|
||||
ASAP because its semantics are rather complicated.
|
||||
|
||||
@item VT_MUSTCAST
|
||||
indicates that a cast to the value type must be performed if the value
|
||||
is used (lazy casting).
|
||||
|
||||
@item VT_FORWARD
|
||||
indicates that the value is a forward reference to a variable or a function.
|
||||
|
||||
@item VT_MUSTBOUND
|
||||
@itemx VT_BOUNDED
|
||||
are only used for optional bound checking.
|
||||
|
||||
@end table
|
||||
|
||||
@subsection Manipulating the value stack
|
||||
|
||||
@code{vsetc()} and @code{vset()} pushes a new value on the value
|
||||
stack. If the previous @code{vtop} was stored in a very unsafe place(for
|
||||
example in the CPU flags), then some code is generated to put the
|
||||
previous @code{vtop} in a safe storage.
|
||||
|
||||
@code{vpop()} pops @code{vtop}. In some cases, it also generates cleanup
|
||||
code (for example if stacked floating point registers are used as on
|
||||
x86).
|
||||
|
||||
The @code{gv(rc)} function generates code to evaluate @code{vtop} (the
|
||||
top value of the stack) into registers. @var{rc} selects in which
|
||||
register class the value should be put. @code{gv()} is the @emph{most
|
||||
important function} of the code generator.
|
||||
|
||||
@code{gv2()} is the same as @code{gv()} but for the top two stack
|
||||
entries.
|
||||
|
||||
@subsection CPU dependent code generation
|
||||
|
||||
See the @file{i386-gen.c} file to have an example.
|
||||
|
||||
@table @code
|
||||
|
||||
@item load()
|
||||
must generate the code needed to load a stack value into a register.
|
||||
|
||||
@item store()
|
||||
must generate the code needed to store a register into a stack value
|
||||
lvalue.
|
||||
|
||||
@item gfunc_start()
|
||||
@itemx gfunc_param()
|
||||
@itemx gfunc_call()
|
||||
should generate a function call
|
||||
|
||||
@item gfunc_prolog()
|
||||
@itemx gfunc_epilog()
|
||||
should generate a function prolog/epilog.
|
||||
|
||||
@item gen_opi(op)
|
||||
must generate the binary integer operation @var{op} on the two top
|
||||
entries of the stack which are guaranted to contain integer types.
|
||||
|
||||
The result value should be put on the stack.
|
||||
|
||||
@item gen_opf(op)
|
||||
same as @code{gen_opi()} for floating point operations. The two top
|
||||
entries of the stack are guaranted to contain floating point values of
|
||||
same types.
|
||||
|
||||
@item gen_cvt_itof()
|
||||
integer to floating point conversion.
|
||||
|
||||
@item gen_cvt_ftoi()
|
||||
floating point to integer conversion.
|
||||
|
||||
@item gen_cvt_ftof()
|
||||
floating point to floating point of different size conversion.
|
||||
|
||||
@item gen_bounded_ptr_add()
|
||||
@item gen_bounded_ptr_deref()
|
||||
are only used for bound checking.
|
||||
|
||||
@end table
|
||||
|
||||
@section Optimizations done
|
||||
|
||||
Constant propagation is done for all operations. Multiplications and
|
||||
divisions are optimized to shifts when appropriate. Comparison
|
||||
operators are optimized by maintaining a special cache for the
|
||||
processor flags. &&, || and ! are optimized by maintaining a special
|
||||
'jump target' value. No other jump optimization is currently performed
|
||||
because it would require to store the code in a more abstract fashion.
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
Loading…
Reference in a new issue