Understanding ICE for Teams Media

Last week, I wrote an article about how to use Google Chrome or Edge to troubleshoot Teams media issues. That article covered the step by step guide of the process, however I wanted to share some more details about the ICE protocol as it could be important to understand. Here is the article link from last week: How to Capture WebRTC Logs

Overview

In this part 1 article, I will go through some of the basic concepts of how to Gather Candidates and how ICE sends an offer. Though Microsoft is following the RFC standard, it also has it's own ICE extensions which are proprietary. Keep this in mind especially when you are trying to integrate with other systems.

ICE, (Interactive Connectivity Establishment) is the term used to describe how Microsoft Teams discover and choose to use media paths between two endpoints. This process is based off the RFC at https://tools.ietf.org/html/rfc5245.

The sequence of operations in ICE as we do it is roughly:

Gather candidates
Send an offer
Build candidate pairs
Test those pairs
Nominate a final path

Gather candidates

Agent: As defined in RFC 3264, an agent is the protocol
implementation involved in the offer/answer exchange. There are
two agents involved in an offer/answer exchange.

In a typical ICE deployment, we have two endpoints (known as AGENTS) that want to communicate. They are able to communicate indirectly via some signalling protocol (such as SIP), by which they can perform an offer/answer exchange of SDP messages. At the beginning of the ICE process, the agents are ignorant of their own topologies. In particular, they might or might not be behind a NAT (or multiple tiers of NATs). ICE allows the agents to discover enough information about their topologies to potentially find one or more paths by which they can communicate. The basic idea behind ICE is as follows: each agent has a variety of candidate TRANSPORT ADDRESSES (combination of IP address and port for a particular transport protocol. (TCP, UDP)

In order to execute ICE, an agent has to identify all of its address candidates. A CANDIDATE is a transport address -- a combination of IP address and port for a particular transport protocol.This document defines three types of candidates, some derived from physical or logical network interfaces, others discoverable via STUN and TURN. Naturally, one viable candidate is a transport address obtained directly from a local interface. Such a candidate is called a HOST CANDIDATE. The local interface could be ethernet or WiFi, or it could be one that is obtained through a tunnel mechanism, such as a Virtual Private Network (VPN) or Mobile IP (MIP). In all cases, such a network interface appears to the agent as a local interface from which ports (and thus candidates) can be allocated.

If an agent is multihomed, it obtains a candidate from each IP address. Depending on the location of the PEER (the other agent in the session) on the IP network relative to the agent, the agent may be reachable by the peer through one or more of those IP addresses.

Next, the agent uses STUN or TURN to obtain additional candidates. These come in two flavors: translated addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES) and addresses on TURN servers (RELAYED CANDIDATES). When TURN servers are utilized, both types of candidates are obtained from the TURN server. If only STUN servers are utilized, only server reflexive candidates are obtained from them.

An offerer can do this based on a user interface cue, or based on an explicit request to initiate a session. Every candidate is a transport address. It also has a type and a base. The base of a candidate is the candidate that an agent must send from when using that candidate.

So this notion of gathering candidates means to figure out all the different ways that a client can be contacted. One easy candidate is the local IP address of your machine - that is one way you can be contacted. Other candidates are harder to get. For example, you can connect to a relay so that the relay will act as a packet recipient on your behalf, forwarding packets it receives for you to your client. The relay can also inform you of your server reflexive address, which is the public IP and port given to your traffic by your NAT:

Send an offer

Once Agent A has gathered all of its candidates, it orders them in highest to lowest priority and sends them to Agent B over the signaling channel.

The prioritization process results in the assignment of a priority to each candidate. Each candidate for a media stream MUST have a unique priority that MUST be a positive integer between 1 and 2147483647. By the way number 2147483647 is is the eighth Mersenne prime.

2147483647 - Wikipedia

The priority is computed using the following formula: priority = (2^24)*(type preference) + (2^8)*(local preference) + (2^0)*(256 - component ID). Type preference must be an integer between 0 and 126 where 126 represent the highest priority and 0 is the lowest. So setting this to 0 means that candidate type will be used as last resort. The local preference must be an integer between 0 and 65535. It represents a preference for the particular IP address, in case the agent is multihomed. 65535 represents the highest and 0 is the lowest priority. Component ID is an integer and must be between 1 and 256 inclusive.

This priority will be used by ICE to determine the order of the connectivity checks and the relative preference for candidates.

Nest, the agent eliminates redundant candidates. A candidate is redundant if its transport address is the same as an another candidate and its base equals the base of that other candidate. The agent will eliminate the redundant candidate with the lower priority.

The candidates are carried in attributes in the SDP offer. In the SDP offer an agent will include an M line for each media stream wishes to use. if there are multiple M lines ICE will perform connectivity checks for the first m line first. So normally Teams will use the Audio M line first and then Video as it would mean that Audio can start flowing before the other M lines are processed.

When Agent B receives the offer, it performs the same gathering process and responds with its own list of candidates.

At the end of this process, each agent has a complete list of both its candidates and its peer's candidates. It pairs them up, resulting in CANDIDATE PAIRS. To see which pairs work, each agent schedules a series of CHECKS. Each check is a STUN request/response transaction that the client will perform on a particular candidate pair by sending a STUN request from the local candidate to the remote candidate.

With both agents performing a check on a candidate pair, the result is a 4-way handshake.

Once all desired candidates have been gathered (and it is important to note that there is no fixed set of candidates that must be included in an offer, clients can modify their offers as desired), those candidates are all gathered into a list known as an offer. This offer is delivered to the peer, which allows it to then proceed to build candidate pairs.

Search This Blog

Krisz's Microsoft Teams adventure