Some years ago I made a music audio recording, and I can't find the original WAV files, I have only compressed MP3s. Now I found an audio CD, but I don't know if it was made using the original, uncompressed WAVs, or if it was made from compressed MP3 or OGG files.
Is there a way how to detect if an audio sample has been compressed and decompressed using a lossy compression such as MP, OGG, ..., without having the original to compare to?
Update:
Trying @MisterHenson's suggestion, I plotted the spectra of the two samples, with obvious differences in the graphs:
The sample from the CD:
The sample from the MP3:
This practically solves solves my current problem, but still I have these open questions:
- If the spectra were visually indistinguishable, I wouldn't know if there is a real difference, or that I just can't distinguish them (i.e. the compression would be of better quality). What else could I try?
- Similarly what would I do if I didn't have the MP3 file to compare to, just a single audio sample?
- Is there an automated method, that'd answer the question with a reasonable probability?
I made an example to stress the topology of all MP3 transcodes, the source material being a Chopin nocturne. MP3 on top, Lossless on bottom. All recordings have background noise of some amplitude, and that noise is faintly visible here. What the MP3 transcode (Lame's V2 preset in this case) does is create a hard limit at ~16kHz. On a 320kbps bitrate 44.1kHz sample rate MP3, this hard limit appears at around 20kHz, but it would still be visibly different in this image.
You can pick out this shelf without having the original lossless file for comparison. I'm willing to say all music has amplitude at frequencies above even 19kHz. Here's an example for which I do not have the lossless source file, just a 320kbps MP3. You can see the very hard limit at 20kHz as well as a milder cutoff at 19kHz. Were it lossless, that red blob in the middle would extend all the way up to 22kHz since the sample rate is 44.1kHz.
I would say this process is probably automatable, but I do not know of any attempts to automate it. If this were automated, though, I'd say it could pick Lossy from Lossless with much higher accuracy than you or I, by virtue of it being able to analyze the entire spectrum as opposed to just the high frequency cutoffs.
Full res images: