Python: Audio segmentation with overlapping and hamming windows

4k views Asked by At

I would like to do such a thinks:

  1. Segment the audio file (divide it into frames) - to avoid information loss, the frames should overlap.
  2. In each frame, apply a window function (Hann, Hamming, Blackman etc) - to minimize discontinuities at the beginning and end.

I managed to save the audio file as a numpy array:

def wave_open(path, normalize=True, rm_constant=False):
    path = wave.open(path, 'rb')
    frames_n = path.getnframes()
    channels = path.getnchannels()
    sample_rate = path.getframerate()
    duration = frames_n / float(sample_rate)
    read_frames = path.readframes(frames_n)
    path.close()
    data = struct.unpack("%dh" % channels * frames_n, read_frames)
    if  channels == 1:
        data = np.array(data, dtype=np.int16)
        return data
    else:    
        print("More channels are not supported")

And then I did a hamming window on the whole signal:

N = 11145
win = np.hanning(N)
windowed_signal = (np.fft.rfft(win*data))

But I don't know how to split my signal into frames (segments) before useing hamming window. Please help me :)

2

There are 2 answers

0
enne On

Here is a solution using librosa.

import librosa
import numpy as np

x = np.arange(0, 128)
frame_len, hop_len = 16, 8
frames = librosa.util.frame(x, frame_length=frame_len, hop_length=hop_len)
windowed_frames = np.hanning(frame_len).reshape(-1, 1)*frames

# Print frames
for i, frame in enumerate(frames):
    print("Frame {}: {}".format(i, frame))

# Print windowed frames
for i, frame in enumerate(windowed_frames):
    print("Win Frame {}: {}".format(i, np.round(frame, 3)))

Output:

Frame 0: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
Frame 1: [ 8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Frame 2: [16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
Frame 3: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
Frame 4: [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
Frame 5: [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55]
Frame 6: [48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
Frame 7: [56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]
Frame 8: [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
Frame 9: [72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87]
Frame 10: [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
Frame 11: [ 88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103]
Frame 12: [ 96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111]
Frame 13: [104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119]
Frame 14: [112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]

Win Frame 0: [0.    0.043 0.331 1.036 2.209 3.75  5.427 6.924 7.913 8.141 7.5   6.075
 4.146 2.151 0.605 0.   ]
Win Frame 1: [ 0.     0.389  1.654  3.8    6.627  9.75  12.663 14.836 15.825 15.377
 13.5   10.493  6.91   3.474  0.951  0.   ]
Win Frame 2: [ 0.     0.735  2.978  6.564 11.045 15.75  19.899 22.749 23.738 22.613
 19.5   14.911  9.674  4.798  1.297  0.   ]
Win Frame 3: [ 0.     1.081  4.301  9.328 15.463 21.75  27.135 30.661 31.65  29.849
 25.5   19.329 12.438  6.121  1.643  0.   ]
Win Frame 4: [ 0.     1.426  5.625 12.092 19.882 27.75  34.371 38.574 39.563 37.085
 31.5   23.747 15.202  7.445  1.988  0.   ]
Win Frame 5: [ 0.     1.772  6.948 14.856 24.3   33.75  41.607 46.486 47.476 44.321
 37.5   28.165 17.966  8.768  2.334  0.   ]
Win Frame 6: [ 0.     2.118  8.272 17.62  28.718 39.75  48.843 54.399 55.388 51.557
 43.5   32.584 20.729 10.092  2.68   0.   ]
Win Frame 7: [ 0.     2.464  9.595 20.384 33.136 45.75  56.08  62.312 63.301 58.793
 49.5   37.002 23.493 11.415  3.026  0.   ]
Win Frame 8: [ 0.     2.81  10.919 23.148 37.554 51.75  63.316 70.224 71.213 66.029
 55.5   41.42  26.257 12.738  3.372  0.   ]
Win Frame 9: [ 0.     3.156 12.242 25.912 41.972 57.75  70.552 78.137 79.126 73.265
 61.5   45.838 29.021 14.062  3.718  0.   ]
Win Frame 10: [ 0.     3.501 13.566 28.676 46.39  63.75  77.788 86.049 87.038 80.501
 67.5   50.256 31.785 15.385  4.063  0.   ]
Win Frame 11: [ 0.     3.847 14.889 31.44  50.808 69.75  85.024 93.962 94.951 87.737
 73.5   54.674 34.549 16.709  4.409  0.   ]
Win Frame 12: [  0.      4.193  16.213  34.204  55.226  75.75   92.26  101.875 102.864
  94.973  79.5    59.092  37.313  18.032   4.755   0.   ]
Win Frame 13: [  0.      4.539  17.536  36.968  59.645  81.75   99.496 109.787 110.776
 102.209  85.5    63.51   40.077  19.356   5.101   0.   ]
Win Frame 14: [  0.      4.885  18.86   39.732  64.063  87.75  106.732 117.7   118.689
 109.446  91.5    67.929  42.841  20.679   5.447   0.   ]
0
Mark H On

Actually, you're applying a Hann/Hanning window. Use np.hamming() to get a Hamming window.

To split the array, you can use np.split() or np.array_split()

Here's an example:

import numpy as np

x = np.arange(0,128)
frame_size = 16
y = np.split(x,range(frame_size,x.shape[0],frame_size))
for v in y:
  print (v)

Output:

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
[48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]
[64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]
[ 96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111]
[112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]