I have spent days hunting down a connection problem without any luck. I'm trying to implement a relatively simple one2one Call with Kurento.
Below you will find a debug log of Kurento of a case where the connection could be established and a case where the connection failed.
If you need any more logs (eg. of the clients, the signalling server, tcpdumps, or trace logs of Kurento just let me know and I will provide!)
Any help or new input is greatly appreciated!
Description of the Problem:
In about 30% of cases, the WebRTC connection cannot be established. Unfortunately I'm short of any kind of patttern when the Connection can be established and when not, it seems completely random. I'm in the same network, using the same devices, using the same TURN server, using the same signalling protocol, but in 30% of cases the connection cannot be established.
When I run the application locally, it seems to work much more reliably, the connection can be established almost 100% of the time (or maybe even 100% of time, I have tested so many times I lost track). I set up the infrastructure locally with docker, and run the different containers (TURN, Kurento, Signalling) in separate networks to mimic a production deployment.
We experience the same behavior in our development and production environment. In our development environment we have absolutely no firewalls in place, so that doesn't seem to be the problem.
What I have tried to find the cause of the Problem:
Mostly I have been comparing logs of cases that worked and cases that didn't work but I have failed to find any significant difference between them that could point me to the problem.
I have tested the WebRTC connection over the TURN server (with Firefox and the force_relay flag) and over Kurento directly, but in both cases the connection fails in ~30% of cases.
I have tried filtering all ICE candidates that are not Relay candidates.
I have sniffed traffic between our signalling server (which also controls Kurento) and Kurento to see any difference in the JSON RPS messages exchanged but they appear to be essentially the same.
I have tested our STUN and TURN server using this tool: https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/ and I get both serverreflexive and relay candidates that look correct
I have sniffed the traffic from the clients of a successful and unsuccessful connection but could spot a significant difference
I have simplified the Kurento media pipeline (no recording, no Hubs) but the behavior is the same
I have used different browsers (Chrome, Firefox and a native iOS implementation) but the behavior is the same
Kurento debug logs of a case where the connection could be established:
https://gist.github.com/omnibrain/2bc7ad54f626d278d3c8bac29767ac4c
Kurento debug logs of a case where the connection could NOT be established:
https://gist.github.com/omnibrain/f7caee04a5c6d77ea22a9ccfa95dd825
Looking at your traces, your working case selects candidate 10 then selects candidate 7, the non-working only chooses candidate 10.
kurento_logs_webrtc_working.txt
kurento_logs_webrtc_NOT_working.txt
My first thought was you were re-using old candidates but the ports have changed. Changing browsers might change the candidate numbers, I didn't expect them to be deterministic between runs so I had to look twice.
There's one minor difference from the log - that the non-working
IceComponentStateChanged
changes toconnecting
after thecandidate:266015763
appears rather than before. I don't know if that's significant.General notes:
In the past when we've a couple of category of problems:
I'd recommend you use Chrome with chrome://webrtc-internals to help. ICE candidate problems are visible in the webrtc-internals as you can see the state machine walking through its state. There are many more transitions in our working case rather than broken cases.
Adding client-side listeners for the three ice events are helpful too:
This lets you see how the negotiation is going but is basically what is in the chrome://webrtc-internals.
final note, this is what I used in the logging section
/etc/default/kurento-media-server
:I don't remember whether they were better than what you used but I'll throw that out there.