GoogleWebRTC hangs (freezes) the main thread in swift native app (OpenVidu)

1.1k views Asked by At

We have hanging problem (the app freezes due of main thread lock) with our iOS (swift) native app with OpenVidu implementation (which uses GoogleWebRTC under the hood). The specific conditions required: need to join existing room with at least 8 participants already streaming. With 6 participants it happens less often and almost never with less than 6. It doesn't hang if participants join one by one, only if you join the room with all other participants already streaming. This indicates concurrent nature of the issue.

The GoogleWebRTC hangs on setRemoteDescription call:

func setRemoteDescription(sdpAnswer: String) {
    let sessionDescription: RTCSessionDescription = RTCSessionDescription(type: RTCSdpType.answer, sdp: sdpAnswer)
    self.peerConnection!.setRemoteDescription(sessionDescription, completionHandler: {(error) in
        print("Local Peer Remote Description set: " + error.debugDescription)
    })
}

main thread hangs

As you can see on the screenshot above, the main thread hangs on __psynch_cvwait. No any other threads seems being locked. The lock never releases leaving the app completely frozen.

In the attempt to solve it I was trying the following:

  1. I moved OpenVidu signaling server processing (RPC protocol) from the main thread into separate threads. This only caused the lock now occurs in the one of separate threads I created. It now doesn't block the UI, but blocks OV signaling. The problem persists.

  2. I added the lock to process each signaling event (participant join event, publish video, etc) synchronously (one by one). This doesn't help either (it actually made the situation worse).

  3. Instead of using GoogleWebRTC v. 1.1.31999 from Cocoapods, I downloaded the latest GoogleWebRTC sources, built them in release configuration and included into my project. This didn't help to solve the issue.

Any suggestions/comments would be appreciated. Thanks!

EDIT 1:

The signaling_thread and worker_thread are both is waiting for something in the same kind of lock. Nothing of them execute any of my code at the moment of the lock.

I also tried to run in DEBUG build of GoogleWebRTC, in this case no locks happen, but everything works much slower (which is OK for debug, but we can't use this in Production).

enter image description here

EDIT 2:

I tried to wrap in additional DispatchQueue for offer and setLocalDescription callbacks, but this changes nothing. The problem still well reproducible (almost 100% of time, if I have 8 participants with streams):

    self.peerConnection!.offer(for: constrains) { (sdp, error) in
        DispatchQueue.global(qos: .background).async {

            guard let sdp = sdp else {
                return
            }

            self.peerConnection!.setLocalDescription(sdp, completionHandler: { (error) in
                DispatchQueue.global(qos: .background).async {
                    completion(sdp)
                }
            })
        }
    }
2

There are 2 answers

0
Mike Keskinov On BEST ANSWER

After the comment from OpenVidu team, the problem was solved by adding 100ms delay between adding participants who are already in the room. I would consider this more like a hack than a real solution, but I can confirm that it works both in test and in Production environment:

DispatchQueue.global(qos: .background).async {
    for info in dict.values {
        let remoteParticipant = self.newRemoteParticipant(info: info)
        if let streamId = info.streamId {
            remoteParticipant.createOffer(completion: {(sdp) in
                self.receiveVideoFrom(sdp: sdp, remoteParticipant: remoteParticipant, streamId: streamId)
            })
        } else {
            print("No streamId")
        }
        Thread.sleep(forTimeInterval: 0.1)
    }
}
3
Byoungchan Lee On

The WebRTC Obj-C API can be called from any thread, but most method calls are passed to WebRTC's internal thread called signalling thread.

Also, callbacks/observers like SetLocalDescriptionObserverInterface or RTCSetSessionDescriptionCompletionHandler are called from WebRTC on the signaling thread.

Looking at the screenshots, it seems that the signaling thread is currently blocked and can no longer call WebRTC API calls.

So, to avoid deadlocks, it's a good idea to create your own thread / dispatch_queue and handle callbacks.

See https://webrtc.googlesource.com/src/+/0a52ede821ba12ee6fff6260d69cddcca5b86a4e/api/g3doc/index.md and https://webrtc.googlesource.com/src/+/0a52ede821ba12ee6fff6260d69cddcca5b86a4e/api/g3doc/threading_design.md for details.