My i386 code from 893df4b gave the wrong sign to some 8-byte
remainders. Fix by splitting .dvi8 and .rmi8 so each has its own code
to pick the sign. They and .dvu8 and .rmu8 share a private sub
.divrem8 for unsigned division.
Improve the i386 code by using instructions like _bsr_ and _shrd_.
Change the helpers to yield a quotient in ebx:eax or a remainder in
ecx:edx; this seems more convenient, because _div_ puts its quotient
in eax and remainder in edx.