ack/mach/powerpc/libem/cfu8.s

.sect .text; .sect .rom; .sect .data; .sect .bss

.sect .text

! Converts a 64-bit double into a 32-bit unsigned integer.
!
! Stack: ( double -- uint )

.define .cfu8
.cfu8:
	lfd f1, 0(sp)                   ! f1 = value to convert
	lis r3, ha16[.fs_80000000]
	lfs f2, lo16[.fs_80000000](r3)  ! f2 = 2**31
	fsub   f1, f1, f2
	fctiwz f1, f1         ! convert value - 2**31
	stfd   f1, 0(sp)
	lwz   r3, 4(sp)
	xoris r3, r3, 0x8000  ! add 2**31
	stw   r3, 4(sp)
	addi  sp, sp, 4
	blr

.sect .rom
.fs_80000000:
	!float 2.147483648e+9 sz 4
	.data1 0117,00,00,00

! Freescale and IBM provide an example using fsel to select value or
! value - 2**31 for fctiwz.  The following code adapts Freescale's
! _Programming Environments Manual for 32-Bit Implementations of the
! PowerPC Architecture_, section C.3.2, pdf page 557.
!
! Given f2 = value clamped from 0 to 2**32 - 1, f4 = 2**31, then
!	fsub	f5, f2, f4
!	fcmpu	cr2, f2, f4
!	fsel	f2, f5, f5, f2
!	fctiwz	f2, f2
!	stfdu	f2, 0(sp)
!	lwz	r3, 4(sp)
!	blt	cr2, 1f
!	xoris	r3, r3, 0x8000
! 1: yields r3 = the converted value.
!
! Debian's clang 3.5.0-10 and gcc 4.9.2-10 don't clamp the value
! before conversion.  They avoid fsel and use the conditional branch
! to pick between 2 fctwiz instructions.
!
! PowerPC 601 lacks fsel (but kernel might trap and emulate fsel).
! PowerPC 603, 604, G3, G4, G5 have fsel.
Shrink .cfu8 With my PowerBook G4, a program that converts values from 1.0 to 4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this shrunken .cfu8 Leave a comment about other ways to implement .cfu8 2018-01-07 21:03:55 +00:00			`.sect .text; .sect .rom; .sect .data; .sect .bss`

Archival checkin (semi-working code). 2007-11-02 18:56:58 +00:00			`.sect .text`

			`! Converts a 64-bit double into a 32-bit unsigned integer.`
			`!`
			`! Stack: ( double -- uint )`

			`.define .cfu8`
			`.cfu8:`
Shrink .cfu8 With my PowerBook G4, a program that converts values from 1.0 to 4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this shrunken .cfu8 Leave a comment about other ways to implement .cfu8 2018-01-07 21:03:55 +00:00			`lfd f1, 0(sp) ! f1 = value to convert`
			`lis r3, ha16[.fs_80000000]`
			`lfs f2, lo16[.fs_80000000](r3) ! f2 = 2**31`
			`fsub f1, f1, f2`
			`fctiwz f1, f1 ! convert value - 2**31`
			`stfd f1, 0(sp)`
			`lwz r3, 4(sp)`
			`xoris r3, r3, 0x8000 ! add 2**31`
			`stw r3, 4(sp)`
			`addi sp, sp, 4`
			`blr`
In PowerPC libem, use the new features of our assembler. The new features are the hi16/lo16 and ha16/lo16 syntax for relocations, and the extended mnemonics like "blr". Use ha16/lo16 to load some double floats with 2 instructions (lis/lfd) instead of 3 (lis/ori/lfd). Use the extended names for branches, comparisons, and bit rotations, so I can more easily read the code. The new names often encode the same machine instructions as the old names, except in a few places where I changed the instructions. Stop using andi. when we don't need to set cr0. In inn.s, I change andi. to extrwi to extract the same bits. In los.s and sts.s, I change "andi. r3, r3, ~3" to "clrrwi r3, r3, 2". This avoids setting cr0 and also stops clearing the high 16 bits of r3. In csa.s, los.s, sts.s, I change some comparisons and right shifts from signed to unsigned (cmplw, cmplwi, srwi), because the sizes are unsigned. In inn.s, the right shift can be signed (sraw) or unsigned (srw), but I use srw because we don't need the carry bit. In fef8.s, I save an instruction by using rlwinm instead of addis/andc to rlwinm to clear a field. The code no longer kills r7. In both fef8.s and fif8.s, I remove the list of killed registers. Also remove some whitespace from ends of lines. 2017-01-23 22:16:39 +00:00
Shrink .cfu8 With my PowerBook G4, a program that converts values from 1.0 to 4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this shrunken .cfu8 Leave a comment about other ways to implement .cfu8 2018-01-07 21:03:55 +00:00			`.sect .rom`
			`.fs_80000000:`
			`!float 2.147483648e+9 sz 4`
			`.data1 0117,00,00,00`
Archival checkin (semi-working code). 2007-11-02 18:56:58 +00:00
Shrink .cfu8 With my PowerBook G4, a program that converts values from 1.0 to 4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this shrunken .cfu8 Leave a comment about other ways to implement .cfu8 2018-01-07 21:03:55 +00:00			`! Freescale and IBM provide an example using fsel to select value or`
			`! value - 2**31 for fctiwz. The following code adapts Freescale's`
			`! _Programming Environments Manual for 32-Bit Implementations of the`
			`! PowerPC Architecture_, section C.3.2, pdf page 557.`
			`!`
			`! Given f2 = value clamped from 0 to 232 - 1, f4 = 231, then`
			`! fsub f5, f2, f4`
			`! fcmpu cr2, f2, f4`
			`! fsel f2, f5, f5, f2`
			`! fctiwz f2, f2`
			`! stfdu f2, 0(sp)`
			`! lwz r3, 4(sp)`
			`! blt cr2, 1f`
			`! xoris r3, r3, 0x8000`
			`! 1: yields r3 = the converted value.`
			`!`
			`! Debian's clang 3.5.0-10 and gcc 4.9.2-10 don't clamp the value`
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes. When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now fix two of my mistakes: - When checking for zero, `extrwi r6, r3, 22, 12` needs to be `extrwi r6, r3, 20, 12`. There are only 20 bits to extract. - After the multiplication by 2**64, I forgot to put the fraction in [0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022. Teach fif 8 about signed zero and NaN. In ncg/table, change cmf so NaN is not equal to any value, and comment why ordered comparisons don't work with NaN. Also add cost for fctwiz, remove extra `uses REG`. Edit comment in cfu8.s because the conditional branch might be before or after fctwiz. 2018-01-22 19:04:15 +00:00			`! before conversion. They avoid fsel and use the conditional branch`
			`! to pick between 2 fctwiz instructions.`
			`!`
			`! PowerPC 601 lacks fsel (but kernel might trap and emulate fsel).`
			`! PowerPC 603, 604, G3, G4, G5 have fsel.`