1 Some of the implementation choices
2 ----------------------------------
6 Load from direct space:
18 1 is bad when x or y involve HL (1b)
20 1 = 12 + n*(8+8) - 8 20 36 68
21 1b = n*(12+12+8) 32 64 128
25 Hmm. (2) is too hard to support in the current model.
42 Then add pair and constant with result in hl
46 inc hl .. ; 6 = 8 + 6n
56 alt: (only when result=hl and left, rigth = pair, const)
60 So (1) is best for n <= 2, (2) is just bad, (3) is good n > 2
63 pair = pair + constant:
76 One cycle. If I cache HL later it will throw away the advantage. Choose 1.
88 So n <= 5 (1) is better.
91 It's nice to use HL as the temp register, but what if I used it as the
92 frame pointer instead of ix?
104 Things get better when you access the same set over, as you get rid
105 of the setup. But they get worse when both ops are on the stack/in
106 direct space. Easiest this way for now. iy may benifit...
126 Why is there the whole xor thing going on?
129 left right l-r c expect
136 With top most bits xored
150 How about using the sign bit and no XOR on r-l?
155 0 01h 01h false - works
170 jp nz,00129$ ; 10 - 72
178 ld -3(ix),a ; 19 = 60
183 ld -3(ix),h ; 19 = 59
185 Same argument as above - not worth the extra cycle.
187 Pending optimisations:
191 iTemp4 = something in direct space
213 Cleaning up the arguments to a call:
221 So for 8 bytes and above use the first form.
236 Same cost. Not worth it, although is does free up HL.
238 Shift left signed on HL