Efficient multiple indirection in 6502 code

257 views Asked by At

Issue

I'm looking at a 6502 program that has multiple arrays of bytes (sound effect data corresponding to a particular voice), which are of varying lengths. Currently this involves explicitly iterating through the first (if queued), then the second etc, and each voice has a separate set of variables for volume, delay etc, so the code is set up to use these hard-coded labels.

I'd like to roll this into a loop, indexing into these additional variables and the sound effect data. Indexing into the variables is fairly straightforward, using indexed addressing, but indexing into the sound effect data involves a lot more work, and I'm wondering if I'm missing something in the application of indexed indirect and indirect indexed addressing.

Below is a self-contained example of what I'm doing at the moment. The part I'd like to tighten up, if possible, is the code in LoadFromTable, ideally with some use of both X and Y addressing:

  .equ  Ptr0,  0x80
  .equ  Ptr1,  0x81

  .org  0xFE00

  .org  0x0000

Init:
  LDX #0xFF
  TXS

Main:
  LDX #0x00
  LDY #0x00
  JSR LoadFromTable
  ; A should be 'H',  0x48

  LDX #0x01
  LDY #0x00
  JSR LoadFromTable
  ; A should be 'B',  0x42

  LDX #0x02
  LDY #0x02
  JSR LoadFromTable
  ; A should be 'A',  0x41

  JMP Main

LoadFromTable:
  TXA           ; Double outer index to account for 16 bit pointers
  ASL           ;   "
  TAX           ;   "
  LDA Table,X   ; Load the low byte of the array into a pointer
  STA Ptr0      ;   "
  INX           ; Load the high byte of the array into the pointer
  LDA Table,X   ;   "
  STA Ptr1      ;   "
  LDA (Ptr0),Y  ; Load the character at the inner index into the array
  RTS

  .org  0x0040

Table:
  .word Item0
  .word Item1
  .word Item2

  .org  0x0080

Item0:
  .byte 'H', 'E', 'L', 'L', 'O', 0x00

Item1:
  .byte 'B', 'O', 'N', 'J', 'O', 'U', 'R', 0x00

Item2:
  .byte 'C', 'I', 'A', 'O', 0x00

  .org  0x00FA

  .word Init
  .word Init
  .word Init

Implementation

Taking onboard the split table idea from @NickWestgate and hoisting out the initial pointer calculation as noted by @Michael, I've moved from something like this:

PROCESS_MUSIC:
  ; ...
  BNE   MusDoB

MusChanA:
  ; ...
  LDA   MUSICA,X
  BNE   MusCmdToneA
  ; ...
  JMP   MusChanA

MusCmdToneA:
  ; ...
  BNE   MusNoteA
  ; ...

MusNoteA:
  ; ...
  LDA   MUSICA,X
  ; ...

MusDoB:
  ; ...
  BNE   MusDoDone

MusChanB:
  ; ...
  LDA   MUSICB,X
  BNE   MusCmdToneB
  ; ...
  JMP   MusChanB

MusCmdToneB:
  ; ...
  BNE   MusNoteB
  ; ...

MusNoteB:
  ; ...

MusDoDone:
  RTS

to this more generalised subroutine:

PROCESS_MUSIC:
  LDX #0x01

PerChannel:
  ; ...
  BNE EndPerChannel
  LDA MusicTableL,X
  STA tmp0
  LDA MusicTableH,X
  STA tmp1

MusChan:
  ; ...
  LDA (tmp0),Y
  BNE MusCmdTone
  ; ...
  BEQ MusChan

MusCmdTone:
  ; ...
  BNE MusNote
  ; ...

MusNote:
  ; ...
  LDA (tmp0),Y
  ; ...

EndPerChannel:
  DEX 
  BPL PerChannel
  RTS

with the addition of the following tables:

MusicTableL:
    .byte <MUSICA
    .byte <MUSICB

MusicTableH:
    .byte >MUSICA
    .byte >MUSICB

This removes the need for the LoadFromTable function I'd originally been using, and seems much cleaner overall.

2

There are 2 answers

5
Nick Westgate On BEST ANSWER

Here are a few ideas. One is passing in an index that's already doubled (i.e. if you can arrange that, or it might already be in the accumulator at some earlier stage).

Another is splitting up the address tables:

LoadFromTable:
  LDA TableL,X ; Load the low byte of the array into a pointer
  STA Ptr0      ;   "
  LDA TableH,X ; Load the high byte of the array into the pointer
  STA Ptr1      ;   "
  LDA (Ptr0),Y  ; Load the character at the inner index into the array
  RTS

TableL:
  .byte #<Item0
  .byte #<Item1
  .byte #<Item2

TableH:
  .byte #>Item0
  .byte #>Item1
  .byte #>Item2

If you can't split up the tables, you can probably still get rid of an INX by doing:

  LDA Table,X   ; Load the low byte of the array into a pointer
  STA Ptr0      ;   "
  LDA Table+1,X ; Load the high byte of the array into the pointer
  STA Ptr1      ;   "

Self modifying code might be useful. Living on page zero will be a factor:

  LDA Table,X   ; Load the low byte of the array into a pointer
  STA Load+1    ;   "
  LDA Table+1,X ; Load the high byte of the array into the pointer
  STA Load+2    ;   "
Load:
  LDA $FFFF,Y   ; Load the character at the inner index into the array

You could also see whether adding Y to the pointer as you store it saves any cycles. It might depend on the most common path used (i.e. if it usually doesn't INC Ptr2/Load+2).

0
supercat On

If you're trying to generate real-time audio on 1 MHz 6502, I've done four-voice music using 40 bytes per sample plus the time to actually feed the DAC(s), which in my case was two zero-page stores to the AUDV0 and AUDV1 registers.

The key was to use a "rolling" sequence of four code snippets of the form:

; Carry must be clear on entry; will be clear on exit
; Acc, Y, and other flags ignored on entry; trashed on exit
; X register ignored and left alone
ldy phase1
lda (wave1a),y
ldy phase0
adc (wave0d),y
sta AUDV0
lda (newPhase0),y
sta phase0
ldy phase2
lda (wave2c),y
ldy phase3
adc (wave3d),y
sta AUDV1

This approach made it possible to produce four-voice music range over a five-octave range, using half-rate and quarter-rate playback for the bottom two octaves, but ties up 40 bytes worth of zero-page pointers (almost a third of the total RAM on the 2600!). If a choice of playback rates was not required, and one could afford to pad each sample out by an extra 256 bytes, one could have the main sample loop be something like:

 ldy outCounter
 clc
 bmi useSetB
useSetA:
 lda (wave0a),y
 adc (wave1a),y
 adc (wave2a),y
 adc (wave3a),y
 inc outCounter
 bne storeAndDone ; Will be 1-128
useSetB:
 lda (wave0b),y
 adc (wave1b),y
 adc (wave2b),y
 adc (wave3b),y
 inc outCounter
 nop ; Equalize time
storeAndDone:
 sta DACoutput

Two other pieces of code would need to run sometime between when the upper bit of outCounter is set and when it's clear again, and again between then it's clear and when it's re-set.

; Run when outCounter reaches 128
 ldx #6 ; Counts by 2 for each voice
fixLp1:
 lda wave0b,x
 sta wave0a,x
 dex
 dex
 bpl fixLp1

; Run when outCounter reaches 128
 ldx #6
fixLp2:
 sec
 lda wave0a,x
 sbc top0,x
 lda wave0a+1,x
 sbc top0+1,x
 bcc notTopYet
 lda wave0a,x
 sbc length0,x
 sta wave0a,x
 lda wave0a+1,x
 sbc length0+1,x
 sta wave0a+1,x
 dex
 dex
 bpl fixLp2
 bmi done2
notTopYet:
 lda wave0a,x
 sta wave0a,x
 lda wave0a+1,x
 sta wave0a+1,x
 dex
 dex
 bpl fixLp2
done:

The latter portion of the code would be fairly long, but it would only need to be run once every 256 samples, and could accommodate arbitrary looping sections.