Extraction PCM Data to generate Chromaprint fingerprint using AudioGraph on Windows 10 (UAP)

732 views Asked by At

I currently trying to write an Win 10 App that uses chromaprint to identify a Song and get the data from acoustid.org.

But my ExtractPCM-Methods thems to return the wrong values. The First Problem it returns to many Values. I have to much data 246 seconds vs 237. The second value is returnd by fpcalc.exe.

The second Problem is that my implementation of ExtractPCM returns completly diferent values than a working implementation I found in an opensource Project. I don't understand Audio very well, but I think my Values are defenetly wrong.

The Reference implementation I use to test my Code is AresRpg. Which uses BASS to extract the PCM data.

The Data BASS returns starts with 10.054 zeros and goes on following:

-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 -1 -1 0

The last values at the end of the two seconds mark looks like this:

612 627 635 647 655 656 662 662 663 658

My implementation starts with 10.528 Zeros followd by:

-15194 11946 21344 12111 25732 12414 -28715 12748 6098 12973 -31198 13046 28784 12844 21248 12797 1592 13165 22544 13294 -20448

And ending with

13636 -17580 -18770 -17639 -29613 -17604 10168 -17608 20472 -17618

My ExtractPCM looks like

    public static async Task<PcmInfo> ExtractPcm(IStorageFile file)
    {
        int sampleRate = 0;
        int chanels = 0;
        double seconds = 0;
        short[] erg = null;
        var graphResult = await Windows.Media.Audio.AudioGraph.CreateAsync(new Windows.Media.Audio.AudioGraphSettings(Windows.Media.Render.AudioRenderCategory.Media));
        using (var graph = graphResult.Graph)
        {
            var encodingProperties = Windows.Media.MediaProperties.AudioEncodingProperties.CreatePcm(22050, 1, 16);

            var fileResult = await graph.CreateFileInputNodeAsync(file);
            var fileInput = fileResult.FileInputNode;
            {
                var output = graph.CreateFrameOutputNode(encodingProperties);
                {

                    fileInput.AddOutgoingConnection(output);
                    var duratio = fileInput.Duration.TotalSeconds;
                    var taskSource = new TaskCompletionSource<object>();
                    fileInput.FileCompleted += (source, e) =>
                    {
                        graph.Stop();
                        output.Stop();
                        taskSource.TrySetResult(null);
                    };
                    graph.Start();
                    await taskSource.Task;

                    var audioFrame = output.GetFrame();
                    using (var lockedBuffer = audioFrame.LockBuffer(Windows.Media.AudioBufferAccessMode.ReadWrite))
                    {
                        using (var refference = lockedBuffer.CreateReference())
                        {
                            await Task.Run(() =>
                             {
                                 unsafe
                                 {
                                     var memoryByteAccess = refference as IMemoryBufferByteAccess;
                                     byte* p;
                                     uint capacity;
                                     memoryByteAccess.GetBuffer(out p, out capacity);
                                     chanels = (int)output.EncodingProperties.ChannelCount;
                                     sampleRate = (int)output.EncodingProperties.SampleRate;
                                     int length = (int)(capacity / sizeof(Int16));
                                     Int16* b = (Int16*)(p);
                                     erg = new short[length];
                                     for (int i = 0; i < erg.Length; i++)
                                         erg[i] = b[i];
                                     seconds = length / (double)output.EncodingProperties.SampleRate / chanels;
                                 }

                             });
                        }
                    }
                }
            }
            var sb = new StringBuilder();
            foreach (var item in erg.SkipWhile(y => y == 0).Take(sampleRate * 2))
            {
                sb.Append($"{item} ");
            }
            System.Diagnostics.Debug.WriteLine($"Leading zeros {erg.TakeWhile(y => y == 0).Count()} ");
            System.Diagnostics.Debug.WriteLine(sb);
            return new PcmInfo() { Data = erg, Seconds = seconds, SampleRate = sampleRate, Chanels = chanels };
        }

    }

AresRPG uses following Code to read the data:

    public static System.Int16[] ExtractPcm(String file, out double seconds)
    {
        seconds = 0;
        int handle = Bass.BASS_StreamCreateFile(file, 0, 0, BASSFlag.BASS_STREAM_DECODE | BASSFlag.BASS_SAMPLE_MONO | BASSFlag.BASS_STREAM_PRESCAN);
        if (handle == 0)
        {
            BASSError error = Bass.BASS_ErrorGetCode();
            // System.Console.WriteLine("ERROR: " + error);
            return null;
        }
        long length = Bass.BASS_ChannelGetLength(handle);
        seconds = Bass.BASS_ChannelBytes2Seconds(handle, length);
        int mixHandle = Un4seen.Bass.AddOn.Mix.BassMix.BASS_Mixer_StreamCreate(22050, 1, BASSFlag.BASS_STREAM_DECODE);
        if (mixHandle == 0)
        {
            BASSError error = Bass.BASS_ErrorGetCode();
            // System.Console.WriteLine("ERROR: " + error);
            return null;
        }
        if (!Un4seen.Bass.AddOn.Mix.BassMix.BASS_Mixer_StreamAddChannel(mixHandle, handle, BASSFlag.BASS_DEFAULT))
        {
            BASSError error = Bass.BASS_ErrorGetCode();
            // System.Console.WriteLine("ERROR: " + error);
            return null;
        }
        List<System.Int16> data = new List<System.Int16>();
        while (true)
        {
            Int16[] buffer = new Int16[512];
            int num = Bass.BASS_ChannelGetData(mixHandle, buffer, buffer.Length * 2);
            if (num == -1)
            {
                BASSError error = Bass.BASS_ErrorGetCode();
                Bass.BASS_StreamFree(handle);
                // System.Console.WriteLine("ERROR: " + error);
                return null;
            }
            for (int i = 0; i < num / 2; ++i)
            {
                if (i < buffer.Length)
                    data.Add(buffer[i]);
                else
                    throw new ApplicationException();
            }
            if (num < buffer.Length * 2)
                break;
        }
        Bass.BASS_StreamFree(handle);
        try
        {
            return data.ToArray();
        }
        catch (System.OutOfMemoryException)
        {
            System.GC.Collect();
            return null;
        }
    }

EDIT

I changed the unsafe Block to following:

await Task.Run(() =>
 {
     unsafe
     {
         var memoryByteAccess = refference as IMemoryBufferByteAccess;
         byte* p;
         uint capacity;

         memoryByteAccess.GetBuffer(out p, out capacity);
         chanels = (int)output.EncodingProperties.ChannelCount;
         sampleRate = (int)output.EncodingProperties.SampleRate;
         int length = Math.Min((int)(sampleRate * duratio) * chanels, (int)(capacity / sizeof(float)));
         float* b = (float*)(p);
         erg = new short[length];
         for (int i = 0; i < erg.Length; i++)
             erg[i] = (Int16)(b[i] * Int16.MaxValue);
         seconds = length / (double)output.EncodingProperties.SampleRate / chanels;
     }

 });

That was because the format that was used to store the data was floatingpoint even when the SubType is not float.

Now I get data I recognize as the song (I as a Person), but still something ist off :(

My extraction is slightly longer and is louder. So playing both versions at the same time the start semes to be synchron but shortly after you can hear that both tracks play at a different speed. In the end it is a difference of about one second. I don't think that the volume could influence the fingerprint but the slower speed my decoding has could make the difference.

The sample I used to test was Rolling Deep from Adele which will be identified by acousticid.org. But I think I'm not allowd to post the edited version.

So I used this cc Song and created a wav-file that has my decoding in the left chanel and the Ares decoding in the right chanel.

Unfortunatly this song is not in the acusticId database, at least I can't find the fingerprint.

0

There are 0 answers