patch originally made to prove correctness (comparing stages)
with tinycc compiling gcc 2.95.3 which would assign registers
differently (but still correctly) when compiled with tcc without
this option).
Also: fixes get_temp_local_var() which on 32-bit systems
happened to return a temporary location that was still
in use because its offset was changed on the vstack
(incremented by four in gv() to load the second register
of a long long).
Also: optimize vrot-t/b (slightly) by using one memmove
instead of moving elements one by one in a loop.