Kotlin - Encoding Raw PCM data to ACC encoded using Media Codec and Muxer

151 views Asked by At

Hi guys so I'm trying to encode the pcm audio received from android audiorecorder into aac encoding using Media Codec (The reason is I need both the raw pcm data but I also want to save it into a m4a file). I've not been able to get this to work yet and maybe need a new pair of eyes. The audio file I'm left with seems corrupted as it contains data but I can't play the audio:

Thoughts:

  1. Maybe the audio recorder init values are not the same as the encoder?

Disclaimer:

  1. I don't really have android or kotlin experience, please excuse this mess.

Recording code:

    private val audioSource = MediaRecorder.AudioSource.MIC
    private val recorderSampleRate = 44100
    private val recorderChannels = AudioFormat.CHANNEL_IN_MONO
    private val recorderAudioEncoding = AudioFormat.ENCODING_PCM_16BIT
    private val samplesPerFrame = 2048
    private val bufferSizeRecording = AudioRecord.getMinBufferSize(
        recorderSampleRate,
        recorderChannels,
        recorderAudioEncoding
    )
    private var audioRecord: AudioRecord? = null
    private var audioEncoder: AudioEncoder? = null
    val audioDataChannel = AudioChannel<ByteArray>()


    private fun initAudioRecorder() {
        val desiredBufferDurationMs = 50
        var bufferSize = (recorderSampleRate * desiredBufferDurationMs / 1000) * 2

        if (bufferSize < bufferSizeRecording) {
            bufferSize = ((bufferSizeRecording / samplesPerFrame) + 1) * samplesPerFrame * 2
        }

        audioRecord = AudioRecord(
            audioSource,
            recorderSampleRate,
            recorderChannels,
            recorderAudioEncoding,
            bufferSize
        )

        if (audioRecord?.state != AudioRecord.STATE_INITIALIZED) {
            Log.d(WSTAG, "Audio Record can't initialize!")
            return
        }

        audioRecord?.startRecording()
    }


    // Initialize the AudioEncoder
    private fun initializeAudioEncoder(outputFilePath: String) {
        audioEncoder = AudioEncoder(
            sampleRate = recorderSampleRate,
            samplesPerFrame = samplesPerFrame,
            outputFilePath = outputFilePath,
            audioDataChannel = audioDataChannel, // Pass the channel here
            encodingCompleteCallback = {
                // This callback will be called when encoding is complete
                // You can perform any post-processing or cleanup here
            }
        )
        audioEncoder?.startEncoding()
    }

    private fun record() {
        recordingJob = CoroutineScope(Dispatchers.IO).launch {
            val outputFilePath = reactApplicationContext.filesDir.absolutePath + "/output.m4a"
            initializeAudioEncoder(outputFilePath)
            val desiredBufferDurationMs = 50
            var bufferSize = (recorderSampleRate * desiredBufferDurationMs / 1000) * 2

            if (bufferSize < bufferSizeRecording) {
                bufferSize = ((bufferSizeRecording / samplesPerFrame) + 1) * samplesPerFrame * 2
            }

            val buf = ByteArray(bufferSize)

            try {
                while (isActive) {
                    val bytesRead = audioRecord?.read(buf, 0, buf.size) ?: break
                    if (bytesRead == -1) break
                    socket?.send(buf.toByteString(0, bytesRead))
                    //Send the ByteArray to the AudioDataCHannel
                    audioDataChannel.send(buf.copyOf(bytesRead))
                    //val pcmData = ByteBuffer.wrap(buf, 0, bytesRead)
                    //audioDataChannel.send(pcmData)
                }
            } catch (e: Exception) {
                Timber.e(e)
                Log.d("Hello", "Error: ${e.message}")
                stop()
            }
        }
    }

Encoder Code:

package com.deepgram.Audio

import android.media.MediaCodec
import android.media.MediaCodecInfo
import android.media.MediaFormat
import android.media.MediaMuxer
import android.util.Log
import kotlinx.coroutines.*
import java.io.File
import java.io.FileOutputStream
import java.nio.ByteBuffer
import kotlinx.coroutines.channels.Channel

class AudioEncoder(
    private val sampleRate: Int,
    private val samplesPerFrame: Int,
    private val outputFilePath: String,
    private val audioDataChannel: Channel<ByteArray>,
    private val encodingCompleteCallback: (() -> Unit)?
) {
    private val MIME_TYPE = "audio/mp4a-latm"
    private val CHANNEL_COUNT = 1
    private val BIT_RATE = 64000
    private val AAC_PROFILE = MediaCodecInfo.CodecProfileLevel.AACObjectLC

    private var audioEncoder: MediaCodec? = null
    private var muxer: MediaMuxer? = null
    private var presentationTimeUs: Long = 0
    private var encodingJob: Job? = null
    private var trackIndex: Int = -1
    private var isEncoding = false

    fun startEncoding() {
        isEncoding = true
        initializeEncoderAndMuxer()
        startEncodingJob()
    }

    fun stopEncoding() {
        isEncoding = false
        encodingJob?.cancel()
    }

    private fun initializeEncoderAndMuxer() {
        deleteFileIfExists(outputFilePath)
        val f = File(outputFilePath)
        f.createNewFile()

        val format = MediaFormat()
        format.setString(MediaFormat.KEY_MIME, MIME_TYPE)
        format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, CHANNEL_COUNT)
        format.setInteger(MediaFormat.KEY_SAMPLE_RATE, sampleRate)
        format.setInteger(MediaFormat.KEY_BIT_RATE, BIT_RATE)
        format.setInteger(MediaFormat.KEY_AAC_PROFILE, AAC_PROFILE)
        format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 16384)

        audioEncoder = MediaCodec.createEncoderByType(MIME_TYPE)
        audioEncoder?.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
        audioEncoder?.start()

        muxer = MediaMuxer(f.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
        val currentMuxer = muxer
        val currentAudioEncoder = audioEncoder

        if (currentMuxer != null && currentAudioEncoder != null) {
            trackIndex = currentMuxer.addTrack(currentAudioEncoder.outputFormat)
            currentMuxer.start()
        } else {
            // Handle the case where muxer is null
            trackIndex = -1
        }
    }

    private fun startEncodingJob() {
        encodingJob = CoroutineScope(Dispatchers.IO).launch {
            val outputFile = File(outputFilePath)
            val outputStream = FileOutputStream(outputFile)
            try {
                while (isActive) {
                    val pcmData: ByteArray? = audioDataChannel.receive()
                    pcmData?.let {
                        encodeAudio(pcmData)
                    }
                }
            } catch (e: Exception) {
                e.printStackTrace()
            } finally {
                outputStream.close()
                releaseEncoderAndMuxer()
                encodingCompleteCallback?.invoke()
            }
        }
    }

    private fun encodeAudio(pcmData: ByteArray) {
        if (!isEncoding) return
        val inputBufferId = audioEncoder?.dequeueInputBuffer(-1)
        if (inputBufferId != null && inputBufferId >= 0) {
            val inputBuffer = audioEncoder?.getInputBuffer(inputBufferId)
            inputBuffer?.clear()
            inputBuffer?.put(pcmData)

            val presentationTime = calculatePresentationTimestamp()

            audioEncoder?.queueInputBuffer(inputBufferId, 0, pcmData.size, presentationTime, 0)
        }

        val bufferInfo = MediaCodec.BufferInfo()
        var outputBufferId = audioEncoder?.dequeueOutputBuffer(bufferInfo, 0)
        while (outputBufferId != null && outputBufferId >= 0) {
            val outputBuffer = audioEncoder?.getOutputBuffer(outputBufferId)
            outputBuffer?.position(bufferInfo.offset)
            outputBuffer?.limit(bufferInfo.offset + bufferInfo.size)

            muxer?.writeSampleData(trackIndex, outputBuffer!!, bufferInfo)
            audioEncoder?.releaseOutputBuffer(outputBufferId, false)
            outputBufferId = audioEncoder?.dequeueOutputBuffer(bufferInfo, 0)
        }
    }

    private fun calculatePresentationTimestamp(): Long {
        val frameDurationUs = (1_000_000L * samplesPerFrame) / sampleRate
        presentationTimeUs += frameDurationUs
        return presentationTimeUs
    }

    private fun releaseEncoderAndMuxer() {
        audioEncoder?.stop()
        audioEncoder?.release()
        muxer?.stop()
        muxer?.release()
    }

    private fun deleteFileIfExists(filePath: String) {
        val file = File(filePath)
        if (file.exists()) {
            val deleted = file.delete()
            if (deleted) {
                Log.d("AudioEncoder", "Deleted existing file at: $filePath")
            } else {
                Log.d("AudioEncoder", "Failed to delete existing file at: $filePath")
            }
        }
    }
}

Some logs:

2023-09-04 20:12:25.697  8368-12742 MediaCodec              com.deepgramexample                  I  Codec shutdown complete
2023-09-04 20:12:25.701  8368-12740 MPEG4Writer             com.deepgramexample                  I  Normal stop process
2023-09-04 20:12:25.701  8368-12740 MPEG4Writer             com.deepgramexample                  D  Audio track stopping. Stop source
2023-09-04 20:12:25.701  8368-12740 MPEG4Writer             com.deepgramexample                  D  Audio track source stopping
2023-09-04 20:12:25.701  8368-12740 MPEG4Writer             com.deepgramexample                  D  Audio track source stopped
2023-09-04 20:12:25.701  8368-12749 MPEG4Writer             com.deepgramexample                  I  Received total/0-length (608/0) buffers and encoded 607 frames. - Audio
2023-09-04 20:12:25.701  8368-12749 MPEG4Writer             com.deepgramexample                  I  Audio track drift time: 0 us
2023-09-04 20:12:25.702  8368-12740 MPEG4Writer             com.deepgramexample                  D  Audio track stopped. Stop source
2023-09-04 20:12:25.702  8368-12740 MPEG4Writer             com.deepgramexample                  D  Stopping writer thread
2023-09-04 20:12:25.702  8368-12748 MPEG4Writer             com.deepgramexample                  D  0 chunks are written in the last batch
2023-09-04 20:12:25.702  8368-12740 MPEG4Writer             com.deepgramexample                  D  Writer thread stopped
2023-09-04 20:12:25.703  8368-12740 MPEG4Writer             com.deepgramexample                  I  Ajust the moov start time from 46439 us -> 46439 us
2023-09-04 20:12:25.705  8368-12740 MPEG4Writer             com.deepgramexample                  I  The mp4 file will not be streamable.
2023-09-04 20:12:25.705  8368-12740 MPEG4Writer             com.deepgramexample                  D  Audio track stopping. Stop source
2023-09-04 20:12:25.705  8368-12740 Hello                   com.deepgramexample                  D  Encoding Complete

I tried all of the code above, the end result was a corrupted file. I wanted to get a m4a file that I could play after I stop the audio recorder.

I could not find any other way of encoding live pcm data than using android media codec

1

There are 1 answers

0
dev.bmax On

After a quick glance over your code I can see a few possible issues:

  1. Don't forget to specify the encoding of the data when you configure the encoder.

    format.setInteger(MediaFormat.KEY_PCM_ENCODING, AudioFormat.ENCODING_PCM_16BIT)

  2. If you call currentAudioEncoder.outputFormat immediately after configuring the codec you will get an incomplete format (as stated in the docs). You are supposed to do it either after you receive an INFO_OUTPUT_FORMAT_CHANGED signal or after receiving the first output buffer.

    See an asynchronous mode example.

    See a synchronous mode example.

  3. Consider using ByteBuffer / ShortBuffer instead of ByteArray when working with raw data, in order to prevent bugs related to sample size and byte order.

    Another example.