diff --git a/tcc-doc.texi b/tcc-doc.texi index 364d9cd2..7a5b9775 100644 --- a/tcc-doc.texi +++ b/tcc-doc.texi @@ -1,52 +1,97 @@ - - - - - Tiny C Compiler Reference Documentation - - +\input texinfo @c -*- texinfo -*- -
-

-Tiny C Compiler Reference Documentation -

-
+@settitle Tiny C Compiler Reference Documentation +@titlepage +@sp 7 +@center @titlefont{Tiny C Compiler Reference Documentation} +@sp 3 +@end titlepage -

Introduction

+@chapter Introduction TinyCC (aka TCC) is a small but hyper fast C compiler. Unlike other C compilers, it is meant to be self-suffisant: you do not need an external assembler or linker because TCC does that for you. -

-TCC compiles so fast that even for big projects Makefiles may + +TCC compiles so @emph{fast} that even for big projects @code{Makefile}s may not be necessary. -

+ TCC not only supports ANSI C, but also most of the new ISO C99 standard and many GNUC extensions. -

-TCC can also be used to make C scripts, -i.e. pieces of C source that you run as a Perl or Python -script. Compilation is so fast that your script will be as fast as if -it was an executable. -

-TCC can also automatically generate memory and bound -checks while allowing all C pointers operations. TCC can do these -checks even if non patched libraries are used. -

-

Full ANSI C support

+TCC can also be used to make @emph{C scripts}, i.e. pieces of C source +that you run as a Perl or Python script. Compilation is so fast that +your script will be as fast as if it was an executable. + +TCC can also automatically generate memory and bound checks +(@xref{bounds}) while allowing all C pointers operations. TCC can do +these checks even if non patched libraries are used. + +With @code{libtcc}, you can use TCC as a backend for dynamic code +generation (@xref{libtcc}). + +@node invoke +@chapter Command line invocation + +@example +usage: tcc [-Idir] [-Dsym[=val]] [-Usym] [-llib] [-g] [-b] + [-i infile] infile [infile_args...] +@end example + +@table @samp +@item -Idir +Specify an additionnal include path. The default ones are: +@file{/usr/include}, @code{prefix}@file{/lib/tcc/include} (@code{prefix} +is usually @file{/usr} or @file{/usr/local}). + +@item -Dsym[=val] +Define preprocessor symbol 'sym' to +val. If val is not present, its value is '1'. Function-like macros can +also be defined: @code{'-DF(a)=a+1'} + +@item -Usym +Undefine preprocessor symbol 'sym'. + +@item -lxxx +Dynamically link your program with library +libxxx.so. Standard library paths are checked, including those +specified with LD_LIBRARY_PATH. + +@item -g +Generate run time debug information so that you get clear run time +error messages: @code{ test.c:68: in function 'test5()': dereferencing +invalid pointer} instead of the laconic @code{Segmentation +fault}. + +@item -b +Generate additionnal support code to check +memory allocations and array/pointer bounds. '-g' is implied. Note +that the generated code is slower and bigger in this case. + +@item -i file +Compile C source 'file' before main C source. With this +command, multiple C files can be compiled and linked together. + +@end table + +Note: the @code{-o file} option to generate an ELF executable is +currently unsupported. + +@chapter C language support + +@section ANSI C TCC implements all the ANSI C standard, including structure bit fields -and floating point numbers (long double, double, and -float fully supported). The following limitations are known: +and floating point numbers (@code{long double}, @code{double}, and +@code{float} fully supported). The following limitations are known: - +@end itemize -

ISOC99 extensions

+@section ISOC99 extensions TCC implements many features of the new C standard: ISO C99. Currently missing items are: complex and imaginary numbers and variable length @@ -54,77 +99,77 @@ arrays. Currently implemented ISOC99 features: - +@item @code{'restrict'} keyword is ignored. +@end itemize -

GNU C extensions

+@section GNU C extensions TCC implements some GNU C extensions: - +@end example -

TinyCC extensions

+generate function @code{'my_add'} in section @code{.mycodesection}. -I have added some extensions I find interesting: +@item GNU style variadic macros: +@example + #define dprintf(fmt, args...) printf(fmt, ## args) - +@end itemize -

TinyCC Memory and Bound checks

- +@node bounds +@chapter TinyCC Memory and Bound checks -This feature is activated with the '-b' -option. -

-Note that pointer size is unchanged and that code generated -with bound checks is fully compatible with unchecked +This feature is activated with the @code{'-b'} (@xref{invoke}). + +Note that pointer size is @emph{unchanged} and that code generated +with bound checks is @emph{fully compatible} with unchecked code. When a pointer comes from unchecked code, it is assumed to be valid. Even very obscure C code with casts should work correctly. -

-

To have more information about the ideas behind this method, check -here. -

-

+ +To have more information about the ideas behind this method, check at +@url{http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html}. + Here are some examples of catched errors: -

- - - +@end example - - +@end example - - +@end example - - +@end example - - +@end example -
-
+
+@table @asis
+
+@item Invalid range with standard string function:
+@example
 {
     char tab[10];
     memset(tab, 0, 11);
 }
-
-
Invalid range with standard string function
-
+@item Bound error in global or local arrays:
+@example
 {
     int tab[10];
     for(i=0;i<11;i++) {
         sum += tab[i];
     }
 }
-
-
Bound error in global or local arrays
-
+@item Bound error in allocated data:
+@example
 {
     int *tab;
     tab = malloc(20 * sizeof(int));
@@ -239,12 +284,10 @@ Here are some examples of catched errors:
     }
     free(tab);
 }
-
-
Bound error in allocated data
-
+@item Access to a freed region:
+@example
 {
     int *tab;
     tab = malloc(20 * sizeof(int));
@@ -253,70 +296,382 @@ Here are some examples of catched errors:
         sum += tab4[i];
     }
 }
-
-
Access to a freed region
-
+@item Freeing an already freed region:
+@example
 {
     int *tab;
     tab = malloc(20 * sizeof(int));
     free(tab);
     free(tab);
 }
-
-
Freeing an already freed region
+@end table -

Command line invocation

- +@node libtcc +@chapter The @code{libtcc} library -
-usage: tcc [-Idir] [-Dsym[=val]] [-Usym] [-llib] [-g] [-b]
-           [-i infile] infile [infile_args...]
-
+The @code{libtcc} library enables you to use TCC as a backend for +dynamic code generation. - - - +Read the @file{libtcc.h} to have an overview of the API. Read +@file{libtcc_test.c} to have a very simple example. - +The idea consists in giving a C string containing the program you want +to compile directly to @code{libtcc}. Then the @code{main()} function of +the compiled string can be launched. - +@chapter Developper's guide - - +This chapter gives some hints to understand how TCC works. You can skip +it if you do not intend to modify the TCC code. - - +@section File reading - +The @code{BufferedFile} structure contains the context needed to read a +file, including the current line number. @code{tcc_open()} opens a new +file and @code{tcc_close()} closes it. @code{inp()} returns the next +character. - - -
'-Idir'Specify an additionnal include path. The default ones are: -/usr/include, /usr/lib/tcc, /usr/local/lib/tcc.
'-Dsym[=val]' Define preprocessor symbol 'sym' to -val. If val is not present, its value is '1'. Function-like macros can -also be defined: '-DF(a)=a+1'
'-Usym' Undefine preprocessor symbol 'sym'.
'-lxxx'Dynamically link your program with library -libxxx.so. Standard library paths are checked, including those -specified with LD_LIBRARY_PATH.
'-g'Generate run time debug information so that you get clear run time -error messages: test.c:68: in function 'test5()': dereferencing -invalid pointer instead of the laconic Segmentation -fault. -
'-b' Generate additionnal support code to check -memory allocations and array/pointer bounds. '-g' is implied. Note -that the generated code is slower and bigger in this case. -
'-i file'Compile C source 'file' before main C source. With this -command, multiple C files can be compiled and linked together.
-
-Note: the '-o file' option to generate an ELF executable is -currently unsupported. +@section Lexer -
-Copyright (c) 2001, 2002 Fabrice Bellard
-Fabrice Bellard - fabrice.bellard at free.fr - http://bellard.org/ - http://www.tinycc.org/ +@code{next()} reads the next token in the current +file. @code{next_nomacro()} reads the next token without macro +expansion. + +@code{tok} contains the current token (see @code{TOK_xxx}) +constants. Identifiers and keywords are also keywords. @code{tokc} +contains additionnal infos about the token (for example a constant value +if number or string token). + +@section Parser + +The parser is hardcoded (yacc is not necessary). It does only one pass, +except: + +@itemize + +@item For initialized arrays with unknown size, a first pass +is done to count the number of elements. + +@item For architectures where arguments are evaluated in +reverse order, a first pass is done to reverse the argument order. + +@end itemize + +@section Types + +The types are stored in a single 'int' variable. It was choosen in the +first stages of development when tcc was much simpler. Now, it may not +be the best solution. + +@example +#define VT_INT 0 /* integer type */ +#define VT_BYTE 1 /* signed byte type */ +#define VT_SHORT 2 /* short type */ +#define VT_VOID 3 /* void type */ +#define VT_PTR 4 /* pointer */ +#define VT_ENUM 5 /* enum definition */ +#define VT_FUNC 6 /* function type */ +#define VT_STRUCT 7 /* struct/union definition */ +#define VT_FLOAT 8 /* IEEE float */ +#define VT_DOUBLE 9 /* IEEE double */ +#define VT_LDOUBLE 10 /* IEEE long double */ +#define VT_BOOL 11 /* ISOC99 boolean type */ +#define VT_LLONG 12 /* 64 bit integer */ +#define VT_LONG 13 /* long integer (NEVER USED as type, only + during parsing) */ +#define VT_BTYPE 0x000f /* mask for basic type */ +#define VT_UNSIGNED 0x0010 /* unsigned type */ +#define VT_ARRAY 0x0020 /* array type (also has VT_PTR) */ +#define VT_BITFIELD 0x0040 /* bitfield modifier */ + +#define VT_STRUCT_SHIFT 16 /* structure/enum name shift (16 bits left) */ +@end example + +When a reference to another type is needed (for pointers, functions and +structures), the @code{32 - VT_STRUCT_SHIFT} high order bits are used to +store an identifier reference. + +The @code{VT_UNSIGNED} flag can be set for chars, shorts, ints and long +longs. + +Arrays are considered as pointers @code{VT_PTR} with the flag +@code{VT_ARRAY} set. + +The @code{VT_BITFIELD} flag can be set for chars, shorts, ints and long +longs. If it is set, then the bitfield position is stored from bits +VT_STRUCT_SHIFT to VT_STRUCT_SHIFT + 5 and the bit field size is stored +from bits VT_STRUCT_SHIFT + 6 to VT_STRUCT_SHIFT + 11. + +@code{VT_LONG} is never used except during parsing. + +During parsing, the storage of an object is also stored in the type +integer: + +@example +#define VT_EXTERN 0x00000080 /* extern definition */ +#define VT_STATIC 0x00000100 /* static variable */ +#define VT_TYPEDEF 0x00000200 /* typedef definition */ +@end example + +@section Symbols + +All symbols are stored in hashed symbol stacks. Each symbol stack +contains @code{Sym} structures. + +@code{Sym.v} contains the symbol name (remember +an idenfier is also a token, so a string is never necessary to store +it). @code{Sym.t} gives the type of the symbol. @code{Sym.r} is usually +the register in which the corresponding variable is stored. @code{Sym.c} is +usually a constant associated to the symbol. + +Four main symbol stacks are defined: + +@table @code + +@item define_stack +for the macros (@code{#define}s). + +@item global_stack +for the global variables, functions and types. + +@item extern_stack +for the external symbols shared between files. + +@item local_stack +for the local variables, functions and types. + +@item label_stack +for the local labels (for @code{goto}). + +@end table + +@code{sym_push()} is used to add a new symbol in the local symbol +stack. If no local symbol stack is active, it is added in the global +symbol stack. + +@code{sym_pop(st,b)} pops symbols from the symbol stack @var{st} until +the symbol @var{b} is on the top of stack. If @var{b} is NULL, the stack +is emptied. + +@code{sym_find(v)} return the symbol associated to the identifier +@var{v}. The local stack is searched first from top to bottom, then the +global stack. + +@section Sections + +The generated code and datas are written in sections. The structure +@code{Section} contains all the necessary information for a given +section. @code{new_section()} creates a new section. ELF file semantics +is assumed for each section. + +The following sections are predefined: + +@table @code + +@item text_section +is the section containing the generated code. @var{ind} contains the +current position in the code section. + +@item data_section +contains initialized data + +@item bss_section +contains uninitialized data + +@item bounds_section +@itemx lbounds_section +are used when bound checking is activated + +@item stab_section +@itemx stabstr_section +are used when debugging is actived to store debug information + +@item symtab_section +@itemx strtab_section +contain the exported symbols (currently only used for debugging). + +@end table + +@section Code generation + +@subsection Introduction + +The TCC code generator directly generates linked binary code in one +pass. It is rather unusual these days (see gcc for example which +generates text assembly), but it allows to be very fast and surprisingly +not so complicated. + +The TCC code generator is register based. Optimization is only done at +the expression level. No intermediate representation of expression is +kept except the current values stored in the @emph{value stack}. + +On x86, three temporary registers are used. When more registers are +needed, one register is flushed in a new local variable. + +@subsection The value stack + +When an expression is parsed, its value is pushed on the value stack +(@var{vstack}). The top of the value stack is @var{vtop}. Each value +stack entry is the structure @code{SValue}. + +@code{SValue.t} is the type. @code{SValue.r} indicates how the value is +currently stored in the generated code. It is usually a CPU register +index (@code{REG_xxx} constants), but additionnal values and flags are +defined: + +@example +#define VT_CONST 0x00f0 /* constant in vc + (must be first non register value) */ +#define VT_LLOCAL 0x00f1 /* lvalue, offset on stack */ +#define VT_LOCAL 0x00f2 /* offset on stack */ +#define VT_CMP 0x00f3 /* the value is stored in processor flags (in vc) */ +#define VT_JMP 0x00f4 /* value is the consequence of jmp true (even) */ +#define VT_JMPI 0x00f5 /* value is the consequence of jmp false (odd) */ +#define VT_LVAL 0x0100 /* var is an lvalue */ +#define VT_FORWARD 0x0200 /* value is forward reference */ +#define VT_MUSTCAST 0x0400 /* value must be casted to be correct (used for + char/short stored in integer registers) */ +#define VT_MUSTBOUND 0x0800 /* bound checking must be done before + dereferencing value */ +#define VT_BOUNDED 0x8000 /* value is bounded. The address of the + bounding function call point is in vc */ +#define VT_LVAL_BYTE 0x1000 /* lvalue is a byte */ +#define VT_LVAL_SHORT 0x2000 /* lvalue is a short */ +#define VT_LVAL_UNSIGNED 0x4000 /* lvalue is unsigned */ +#define VT_LVAL_TYPE (VT_LVAL_BYTE | VT_LVAL_SHORT | VT_LVAL_UNSIGNED) +@end example + +@table @code + +@item VT_CONST +indicates that the value is a constant. It is stored in the union +@code{SValue.c}, depending on its type. + +@item VT_LOCAL +indicates a local variable pointer at offset @code{SValue.c.i} in the +stack. + +@item VT_CMP +indicates that the value is actually stored in the CPU flags (i.e. the +value is the consequence of a test). The value is either 0 or 1. The +actual CPU flags used is indicated in @code{SValue.c.i}. + +@item VT_JMP +@itemx VT_JMPI +indicates that the value is the consequence of a jmp. For VT_JMP, it is +1 if the jump is taken, 0 otherwise. For VT_JMPI it is inverted. + +These values are used to compile the @code{||} and @code{&&} logical +operators. + +@item VT_LVAL +is a flag indicating that the value is actually an lvalue (left value of +an assignment). It means that the value stored is actually a pointer to +the wanted value. + +Understanding the use @code{VT_LVAL} is very important if you want to +understand how TCC works. + +@item VT_LVAL_BYTE +@itemx VT_LVAL_SHORT +@itemx VT_LVAL_UNSIGNED +if the lvalue has an integer type, then these flags give its real +type. The type alone is not suffisant in case of cast optimisations. + +@item VT_LLOCAL +is a saved lvalue on the stack. @code{VT_LLOCAL} should be suppressed +ASAP because its semantics are rather complicated. + +@item VT_MUSTCAST +indicates that a cast to the value type must be performed if the value +is used (lazy casting). + +@item VT_FORWARD +indicates that the value is a forward reference to a variable or a function. + +@item VT_MUSTBOUND +@itemx VT_BOUNDED +are only used for optional bound checking. + +@end table + +@subsection Manipulating the value stack + +@code{vsetc()} and @code{vset()} pushes a new value on the value +stack. If the previous @code{vtop} was stored in a very unsafe place(for +example in the CPU flags), then some code is generated to put the +previous @code{vtop} in a safe storage. + +@code{vpop()} pops @code{vtop}. In some cases, it also generates cleanup +code (for example if stacked floating point registers are used as on +x86). + +The @code{gv(rc)} function generates code to evaluate @code{vtop} (the +top value of the stack) into registers. @var{rc} selects in which +register class the value should be put. @code{gv()} is the @emph{most +important function} of the code generator. + +@code{gv2()} is the same as @code{gv()} but for the top two stack +entries. + +@subsection CPU dependent code generation + +See the @file{i386-gen.c} file to have an example. + +@table @code + +@item load() +must generate the code needed to load a stack value into a register. + +@item store() +must generate the code needed to store a register into a stack value +lvalue. + +@item gfunc_start() +@itemx gfunc_param() +@itemx gfunc_call() +should generate a function call + +@item gfunc_prolog() +@itemx gfunc_epilog() +should generate a function prolog/epilog. + +@item gen_opi(op) +must generate the binary integer operation @var{op} on the two top +entries of the stack which are guaranted to contain integer types. + +The result value should be put on the stack. + +@item gen_opf(op) +same as @code{gen_opi()} for floating point operations. The two top +entries of the stack are guaranted to contain floating point values of +same types. + +@item gen_cvt_itof() +integer to floating point conversion. + +@item gen_cvt_ftoi() +floating point to integer conversion. + +@item gen_cvt_ftof() +floating point to floating point of different size conversion. + +@item gen_bounded_ptr_add() +@item gen_bounded_ptr_deref() +are only used for bound checking. + +@end table + +@section Optimizations done + +Constant propagation is done for all operations. Multiplications and +divisions are optimized to shifts when appropriate. Comparison +operators are optimized by maintaining a special cache for the +processor flags. &&, || and ! are optimized by maintaining a special +'jump target' value. No other jump optimization is currently performed +because it would require to store the code in a more abstract fashion. - -