I am trying to make a fast fft convolution (fft blocksize=1024 samples) of an headpone related impulse response (L=512 samples) with an sine wave audio signal. Here you can see the plot of the impulse response :
http://fs2.directupload.net/images/150617/fc9j6cs7.png
I split the wave audio signal in blocks with blocksize M=513. Then I zeroppaded each wave block and the hrtf to 1024 samples, applied fft, multiplication and ifft. You can see the result of one block in the following picture:
http://fs1.directupload.net/images/150617/bxoe9fkm.png
After this I slided each block 513 samples on the time scale further than the last block (Hop Size = 0) and added it to the old block, what gave a correct convolved output.
Here you can see (a simplified version of) the python code for 5 added output blocks:
# set iteration counter to 0
blocknumber = 0 
# read in audio file
_, audiodata = scipy.io.wavefile.read("filename_audio_wave")
_, hrtf_block =  scipy.io.wavefile.read("filename_hrtf_wave")
while blocknumber <5:
    # set blocksizes
    fft_blocksize = 1024
    audio_blocksize = 513
    hrtf_blocksize = 512
    binaural = np.zeros((fft_blocksize*5, ), dtype=np.int16)
    # Do zeropadding: zeropad hrtf and audio
    hrtf_block_zeropadded = np.zeros((fft_blocksize, ), dtype = 'int16')
    hrtf_block_zeropadded[0:hrtf_blocksize, ] = hrtf_block
    sp_block_sp_zeropadded = np.zeros((fft_blocksize, ), dtype = 'int16')
    sp_block_sp_zeropadded[0:sp_blocksize, ] = audiodata[blocknumber*audio_blocksize : (blocknumber+1)*audio_blocksize, ]
    # bring time domain input to to frequency domain
    hrtf_block_fft = fft(hrtf_block_zeropadded, fft_blocksize)
    audio_block_fft = fft(audio_block_zeropadded, fft_blocksize)
    binaural_block_frequency = hrtf_block_fft * audio_block_fft
    binaural_block = ifft(binaural_block_frequency, fft_blocksize).real
    # add the block to the other blocks
    slide_forward_samples = 513
    binaural[blocknumber*slide_forward_samples : blocknumber*slide_forward_samples+fft_blocksize, ] += binaural_block
    blocknumber+=1
In the next step I wanted to convolve each block with a slighty different impulse response what led to crackling noise between the blocks. I found out that i have to apply a window and let the the convolved blocks overlap. I didn't get how to do it exactly. Can you please give me some advices?
Let us consider we want to reach on overlap of 50% and use the hamming window.
- Is it correct that every block needs to contain now 50% of the samples of the previous block?
- Where do i have to apply the window? Do I have to apply it before the fft convolution on the audio signal blocks (windowsize : 513 samples) or on the ifft output (windowsize 1024: samples)?
- And how many samples do I need to slide the fft output signal on the timescale with 50% overlap?
Thank your very much for your help
 
                        
Using a window with overlap-add/save fast convolution is rarely the correct way to filter. But if you want to try:
Note that a sequence of Von Hann windows, offset by half their length, sums to unity gain, except at the very beginning or end.
So change your data window length from 513 to 512, use an offset of 256 (half the 512 for unity gain), use a Von Hann window (Hamming will change the gain), pad to anything over the window length of 512 plus the impulse response length, and use overlap add/save with the remainder (perhaps carrying the tail over several input window segments).
513 is no good, because the nearest integer to a half offset will cause some ripple in the summed gain of all the overlapped windows.
You could also use a 1/4th window offset, which will double the gain, and adjust this 2X gain in post-processing. etc.