I am writing a RTSP client with Flutter SDK that is also using the Apple VideoToolKit API for the HW decoder. I will refrain from posting code at the moment because I think it would just be easier to explain.
I'm using wireshark to inspect contents and it seems parsing is working correctly. My problem is I can't get the data in the right format for the decoder, hence getting OSStatus (Swift) error code -8969. If someone could please clarify the AVCC format along with the inner payload format expected by the decoder, that would be great.
My decoding is done for one frame at a time. So for each frame I create a new decoding session.
The SPS and PPS value is set to a static value on program startup, and then updated once the server begins sending RTP. I don't parse the sprop-parameter-sets value at the moment, I will add this at a later time.
The below buffer will be the resulting AVCC format for a FU-A RTP payload with 3 slices. Please let me know if there is anything I get wrong here. The 4 byte length is big endian representation. When I create the decoding session in Swift, I consider all of this to be 1 sample.
[4 byte length][FU identifier slice 1][FU header slice 1][NAL Unit payload slice 1][4 byte length][FU identifier slice 2][FU header slice 2][NAL Unit payload slice 2][4 byte length][FU identifier slice 3][FU header slice 3][NAL Unit payload slice 3]
The length is:
length = RTSP length field - RTP_HEADER_LEN.
Where the RTP_HEADER_LEN is equal to 12 bytes.
Any guidance appreciated, thank you.
 
                        
Switched to HLS, as RTSP is not widely supported. The Swift API isn't really set up to accept decoded images for forming a video, so just switched to something that is widely supported. And even better, the API just works. So setting it up wasn't so bad and it saves time.