ack/mach/powerpc/libem/sar4.s

.sect .text

! Store to bounds-checked array.
!
! Stack: ( element array-adr index descriptor-adr -- )

.define .sar4
.sar4:
	mfspr r10, lr
	bl .aar4
	! pass r3 = size from .aar4 to .sts4
	bl .sts4
	mtspr lr, r10
	blr
Add the missing .lar4 and .sar4 for powerpc. Inspired by the sparc code (mach/sparc/libem/lar.s). My powerpc code might still have bugs, but it's enough for examples/hilo.mod to work. May need to 'make clean' or touch a build.lua file, so ackbuilder can notice the new lar4.s and sar4.s files and build them. 2016-09-18 03:55:55 +00:00			`.sect .text`

			`! Store to bounds-checked array.`
			`!`
Use .los4 in lar 4 and .sts4 in sar 4. Our libem had two implementations of loading a block from a stack, one for lar 4 and one for los 4. Now lar 4 and los 4 share the code in .los4. Likewise, sar 4 and sts 4 share the code in .sts4. Rename .los to .los4 and .sts to .sts4, because they implement los 4 and sts 4. Remove the special case for loading or storing 4 bytes, because we can do it with 1 iteration of the loop. Remove the lines to "align size" where the size must already be a multiple of 4. Fix the upper bound check in .aar4. Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the real stack, except that .los4 and .sts4 take the size in register r3. Have .aar4 set r3 to the size of the array element. So lar 4 is just .aar4 then .los4, and sar 4 is just .aar4 then .sts4. ncg no longer calls .lar4 and .sar4 in libem, because it inlines the code; but I keep .lar4 and .sar4 in libem, because mcg references them. They might or might not work in mcg. 2017-02-13 20:22:00 +00:00			`! Stack: ( element array-adr index descriptor-adr -- )`
Add the missing .lar4 and .sar4 for powerpc. Inspired by the sparc code (mach/sparc/libem/lar.s). My powerpc code might still have bugs, but it's enough for examples/hilo.mod to work. May need to 'make clean' or touch a build.lua file, so ackbuilder can notice the new lar4.s and sar4.s files and build them. 2016-09-18 03:55:55 +00:00
			`.define .sar4`
			`.sar4:`
			`mfspr r10, lr`
			`bl .aar4`
Use .los4 in lar 4 and .sts4 in sar 4. Our libem had two implementations of loading a block from a stack, one for lar 4 and one for los 4. Now lar 4 and los 4 share the code in .los4. Likewise, sar 4 and sts 4 share the code in .sts4. Rename .los to .los4 and .sts to .sts4, because they implement los 4 and sts 4. Remove the special case for loading or storing 4 bytes, because we can do it with 1 iteration of the loop. Remove the lines to "align size" where the size must already be a multiple of 4. Fix the upper bound check in .aar4. Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the real stack, except that .los4 and .sts4 take the size in register r3. Have .aar4 set r3 to the size of the array element. So lar 4 is just .aar4 then .los4, and sar 4 is just .aar4 then .sts4. ncg no longer calls .lar4 and .sar4 in libem, because it inlines the code; but I keep .lar4 and .sar4 in libem, because mcg references them. They might or might not work in mcg. 2017-02-13 20:22:00 +00:00			`! pass r3 = size from .aar4 to .sts4`
			`bl .sts4`
Add the missing .lar4 and .sar4 for powerpc. Inspired by the sparc code (mach/sparc/libem/lar.s). My powerpc code might still have bugs, but it's enough for examples/hilo.mod to work. May need to 'make clean' or touch a build.lua file, so ackbuilder can notice the new lar4.s and sar4.s files and build them. 2016-09-18 03:55:55 +00:00			`mtspr lr, r10`
In PowerPC libem, use the new features of our assembler. The new features are the hi16/lo16 and ha16/lo16 syntax for relocations, and the extended mnemonics like "blr". Use ha16/lo16 to load some double floats with 2 instructions (lis/lfd) instead of 3 (lis/ori/lfd). Use the extended names for branches, comparisons, and bit rotations, so I can more easily read the code. The new names often encode the same machine instructions as the old names, except in a few places where I changed the instructions. Stop using andi. when we don't need to set cr0. In inn.s, I change andi. to extrwi to extract the same bits. In los.s and sts.s, I change "andi. r3, r3, ~3" to "clrrwi r3, r3, 2". This avoids setting cr0 and also stops clearing the high 16 bits of r3. In csa.s, los.s, sts.s, I change some comparisons and right shifts from signed to unsigned (cmplw, cmplwi, srwi), because the sizes are unsigned. In inn.s, the right shift can be signed (sraw) or unsigned (srw), but I use srw because we don't need the carry bit. In fef8.s, I save an instruction by using rlwinm instead of addis/andc to rlwinm to clear a field. The code no longer kills r7. In both fef8.s and fif8.s, I remove the list of killed registers. Also remove some whitespace from ends of lines. 2017-01-23 22:16:39 +00:00			`blr`