+/*
+ Benchmarks on dhry.c 2.1 with 32766 loops and a 10ms clock:
+ ticks dhry size
+ Base with asm strcpy / strcmp / memcpy: 23198 141 1A14
+ Improved WORD push 22784 144 19AE
+ With label1 on 22694 144 197E
+ With label2 on 22743 144 198A
+ With label3 on 22776 144 1999
+ With label4 on 22776 144 1999
+ With all 'label' on 22661 144 196F
+ With loopInvariant on 20919 156 19AB
+ With loopInduction on Breaks 198B
+ With all working on 20796 158 196C
+ Slightly better genCmp(signed) 20597 159 195B
+ Better reg packing, first peephole 20038 163 1873
+ With assign packing 19281 165 1849
+ 5/3/00 17741 185 17B6
+ With reg params for mul and div 16234 202 162D
+
+ 1. Starting again at 3 Aug 01 34965 93 219C
+ No asm strings
+ Includes long mul/div in code
+ 2. Optimised memcpy for acc use 32102 102 226B
+ 3. Optimised strcpy for acc use 27819 117 2237
+ 3a Optimised memcpy fun
+ 4. Optimised strcmp fun 21999 149 2294
+ 5. Optimised strcmp further 21660 151 228C
+ 6. Optimised memcpy by unroling 20885 157 2201
+ 7. After turning loop induction on 19862 165 236D
+ 8. Same as 7 but with more info
+ 9. With asm optimised strings 17030 192 2223
+
+ 10 and below are with asm strings off.
+
+ Apparent advantage of turning on regparams:
+ 1. Cost of push
+ Decent case is push of a constant
+ - ld hl,#n; push hl: (10+11)*nargs
+ 2. Cost of pull from stack
+ Using asm with ld hl, etc
+ - ld hl,#2; add hl,sp; (ld bc,(hl); hl+=2)*nargs
+ 10+11+(7+6+7+6)*nargs
+ 3. Cost of fixing stack
+ - pop hl*nargs
+ 10*nargs
+
+ So cost is (10+11+7+6+7+10)*nargs+10+11
+ = 51*nargs+21
+ = 123 for mul, div, strcmp, strcpy
+ Saving of (98298+32766+32766+32766)*123 = 24181308
+ At 192 d/s for 682411768t, speed up to 199. Hmm.
+*/
+