Some of the implementation choices
----------------------------------

gbz80:

Load from direct space:
  Alternatives:
  1.  Via HL
	ld hl,#dir
	ld x,(hl)
	inc hl
	ld y,(hl)
  2.  Via a
  	ld a,(dir)
	ld x,a
	ld a,(dir+1)
	ld x,a
  1 is bad when x or y involve HL (1b)
  					8	16	32
     1 = 12 + n*(8+8) - 8  		20	36	68
     1b = n*(12+12+8)			32	64	128
     2 = n*(16+4)			20	40	80
  So choose 2.

  Hmm.  (2) is too hard to support in the current model.

On stack word push
   1.	 lda  hl,x(sp)
	 ld   a,(hl+)
	 ld   h,(hl)
	 ld   l,a
	 push hl
   2.	 lda  hl,x(sp)
	 ld   e,(hl)
	 inc  hl
	 ld   d,(hl)
   1 = d + 8 + 8 + 4
   2 = d + 8 + 8 + 8

Structure member get:
   Normally fetch pair
   Then add pair and constant with result in hl

   ld	l,c	; 4
   ld	h,b	; 4
   inc  hl ..	; 6	= 8 + 6n
or
   ld	l,c	; 4
   ld	h,b	; 4
   ld	a,#0x06	; 7
   add	a,c	; 4
   ld	l,a	; 4
   ld	a,#0x00 ; 7
   adc	a,b	; 4
   ld	h,a	; 4	= 38
alt: (only when result=hl and left, rigth = pair, const)
   ld	   hl,#const	; 10
   add	   hl,pair	; 11	= 21

So (1) is best for n <= 2, (2) is just bad, (3) is good n > 2

How about:
    pair = pair + constant:
1:
    ld	a,#0x08	; 7
    add	a,c	; 4
    ld	c,a	; 4
    ld	a,#0x00	; 7
    adc	a,b	; 4
    ld	b,a	; 4	= 30
2:
	ld	hl,#const	; 10
	add	hl,pair		; 11
	ld	c,l		; 4
	ld	b,h		; 4	= 29
One cycle.  If I cache HL later it will throw away the advantage.  Choose 1.

PlusIncr on pairs:
1:
	 inc	pair		; 6 	= 6n
2:
	ld	a,#0x04		; 7
	add	a,c		; 4
	ld	c,a		; 4
	ld	a,#0x00		; 7
	adc	a,b		; 4
	ld	b,a		; 4 	= 30
So n <= 5 (1) is better.

Frame pointer:
It's nice to use HL as the temp register, but what if I used it as the
frame pointer instead of ix?

Instead of:
	ld	e,5(ix)		; 19
	ld	d,6(ix)		; 19	= 38

	ld	hl,#5		; 10
	add	hl,sp		; 11
	ld	e,(hl)		; 7
	inc	hl		; 6
	ld	d,(hl)		; 7	= 41

Things get better when you access the same set over, as you get rid
of the setup.  But they get worse when both ops are on the stack/in
direct space.  Easiest this way for now.  iy may benifit...