Is this part of a real IFFT process really optimal?

96 views Asked by At

When calculating (I)FFT it is possible to calculate "N*2 real" data points using a ordinary complex (I)FFT of N data points. Not sure about my terminology here, but this is how I've read it described. There are several posts about this on stackoverflow already.

This can speed things up a bit when only dealing with such "real" data which is often the case when dealing with for example sound (re-)synthesis.

This increase in speed is offset by the need for a pre-processing step that somehow... uhh... fidaddles? the data to achieve this. Look I'm not even going to try to convince anyone I fully understand this but thanks to previously mentioned threads, I came up with the following routine, which does the job nicely (thank you!).

However, on my microcontroller this costs a bit more than I'd like even though trigonometric functions are already optimized with LUTs.

But the routine itself just looks like it should be possible to optimize mathematically to minimize processing. To me it seems similar to plain 2d rotation. I just can't quite wrap my head around it, but it just feels like this could be done with fewer both trigonometric calls and arithmetic operations.

I was hoping perhaps someone else might easily see what I don't and provide some insight into how this math may be simplified.

This particular routine is for use with IFFT, before the bit-reversal stage.

pseudo-version:

  INPUT  
  MAG_A/B = 0 TO 1
  PHA_A/B = 0 TO 2PI
  INDEX   = 0 TO PI/2

  r = MAG_A * sin(PHA_A)
  i = MAG_B * sin(PHA_B)
  rsum = r + i
  rdif = r - i
  r = MAG_A * cos(PHA_A)
  i = MAG_B * cos(PHA_B)
  isum = r + i
  idif = r - i
  r = -cos(INDEX)
  i = -sin(INDEX)
  rtmp = r * isum + i * rdif
  itmp = i * isum - r * rdif
  OUTPUT rsum + rtmp
  OUTPUT itmp + idif
  OUTPUT rsum - rtmp
  OUTPUT itmp - idif

original working code, if that's your poison:

  void fft_nz_set(fft_complex_t complex[], unsigned bits, unsigned index, int32_t mag_lo, int32_t pha_lo, int32_t mag_hi, int32_t pha_hi) {
    unsigned size = 1 << bits;
    unsigned shift = SINE_TABLE_BITS - (bits - 1);
    unsigned n = index;        // index for mag_lo, pha_lo
    unsigned z = size - index; // index for mag_hi, pha_hi
    int32_t rsum, rdif, isum, idif, r, i;
    r = smmulr(mag_lo,   sine(pha_lo)); // mag_lo * sin(pha_lo)
    i = smmulr(mag_hi,   sine(pha_hi)); // mag_hi * sin(pha_hi)
    rsum = r + i; rdif = r - i;
    r = smmulr(mag_lo, cosine(pha_lo)); // mag_lo * cos(pha_lo)
    i = smmulr(mag_hi, cosine(pha_hi)); // mag_hi * cos(pha_hi)
    isum = r + i; idif = r - i;
    r = -sinetable[(1 << SINE_BITS) - (index << shift)]; // cos(pi_c * (index / size) / 2)
    i = -sinetable[index << shift];                      // sin(pi_c * (index / size) / 2)
    int32_t rtmp = smmlar(r, isum, smmulr(i, rdif)) << 1; // r * isum + i * rdif
    int32_t itmp = smmlsr(i, isum, smmulr(r, rdif)) << 1; // i * isum - r * rdif
    complex[n].r = rsum + rtmp;
    complex[n].i = itmp + idif;
    complex[z].r = rsum - rtmp;
    complex[z].i = itmp - idif;
  }

  // For reference, this would be used as follows to generate a sawtooth (after IFFT)

  void synth_sawtooth(fft_complex_t *complex, unsigned fft_bits) {
    unsigned fft_size = 1 << fft_bits;
    fft_sym_dc(complex, 0, 0); // sets dc bin [0]
    for(unsigned n = 1, z = fft_size - 1; n <= fft_size >> 1; n++, z--) {
      // calculation of amplitude/index (sawtooth) for both n and z
      fft_sym_magnitude(complex, fft_bits, n, 0x4000000 / n, 0x4000000 / z);
    }
  }
0

There are 0 answers