I have a dataset with voice recording where the first part of record and the second part of record are in different languages and have different pitch. What solutions of this problem you know.
I try to implement this using audio smoothin with threshold by frequency, pitch shifting using librosa, but doesnt have any good results.