Webrtc

External reference: https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9
External reference: https://codelabs.developers.google.com/codelabs/webrtc-web
External reference: https://en.wikipedia.org/wiki/Session_Description_Protocol
External reference: https://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment
External reference: https://web.dev/articles/webrtc-infrastructure
External reference: https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

Setting up a call between WebRTC peers involves three tasks:

Create a RTCPeerConnection for each end of the call and, at each end, add the local stream from getUserMedia().

Get and share network information: potential connection endpoints are known as ICE candidates.

Get and share local and remote descriptions: metadata about local media in SDP format.

— https://codelabs.developers.google.com/codelabs/webrtc-web

H.264 is a format that is more universally supported across different browsers

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

AAC finds more universal adoption

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

foundational protocol that powers WebRTC is UDP.

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

DTLS is used within WebRTC to secure and encrypt all data transfers between participants.

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

SCTP and SRTP are used to multiplex the streams and provide both congestion and flow control.

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

every stream sent between participants is encrypted with Secure Real-Time Protocol (SRTP)

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

generate the keys to encrypt the session, WebRTC utilizes DTLS-SRTP

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

signaling server needs to be using the HTTPS protocol, which encrypts the contents sent across the signaling server

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

WebRTC does not handle signaling

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

job of the MCU is to receive media from each participant, decode it, and mix the audio and video from the participants together into a single stream and send it to each participant

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

Each node sends it’s transcoded media to the SFU, which then forwards this to all the nodes in the session. Unlike the MCU approach, transcoding happens at the edges and not at the server.

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

SFU based approaches tend to scale very well, while keeping the server load to a minimum

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

simple strategy is to use P2P mode when participants are less than 4, and switch to SFUs when it crosses that threshold. Most open-source SFUs today can scale well to about 20–30 participants.

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

topologies

peer-to-peer

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

Multi-point Control Unit

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

Selective Forwarding Unit

— https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9

more than two peers

External reference: https://voximplant.com/blog/an-introduction-to-selective-forwarding-units Blog | Voximplant.com

what if you want to hold a meeting with more than two people? How can you leverage powerful WebRTC APIs to build a multi party conferencing application?

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

selective forwarding unit

selective forwarding unit (SFU) as the preferred method of extending WebRTC to multi party conferencing

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

approach is to build a mesh topology in which every participant sends and receives media from every other participant

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

mesh topology quickly reaches scalability limitations. It consumes a lot of bandwidth and client processing power to manage all the media streams.

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

To increase scalability, you need to build a hub-and-spoke topology by inserting a media server into the network

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

Hub-and-spoke topologies can increase latency because media must traverse a longer path from sender to receiver

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

hub-and-spoke topology introduces an intermediary between clients that breaks the WebRTC end-to-end security feature.

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

Without the incorporation of additional end-to-end encryption (E2EE) techniques, a bad actor could potentially monitor the media as it passes through the media server

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

Two types of media servers can be used to implement the hub-and-spoke topology: The multipoint control unit (MCU); and the selective forwarding unit (SFU).

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

MCU decodes each received media stream, rescales them, creates a new tiled stream featuring all participants, encodes and sends it to all clients.

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

MCU is an expensive and compute-intensive infrastructure element

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

MCU completely relieves clients of local processing and it is the most efficient of all the alternatives in its bandwidth utilization.

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

SFU receives media streams from each participant and merely forwards them to the other participants without changes

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

, in an SFU architecture, the client with the least bandwidth dictates the video quality available to all clients

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

Simulcast SFU

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

designed to prevent a few clients with limited bandwidth resources from degrading the video quality available to all

— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units

peerjs

External reference: https://peerjs.com/

PeerJS wraps the browser’s WebRTC implementation to provide a complete, configurable, and easy-to-use peer-to-peer connection API. Equipped with nothing but an ID, a peer can create a P2P data or media stream connection to a remote peer.

— https://peerjs.com/

They even provide the code for a simple signaling server in https://github.com/peers/peerjs-server

Beware though, that it needs a recent enough browser.

Browser support

Firefox Chrome Edge Safari

80+ 83+ 83+ 15+

We test PeerJS against these versions of Chrome, Edge, Firefox, and Safari with BrowserStack to ensure compatibility. It may work in other and older browsers, but we don’t officially support them. Changes to browser support will be a breaking change going forward.

— https://github.com/peers/peerjs

Firefox	Chrome	Edge	Safari
80+	83+	83+	15+

debug

Take a look at chrome://webrtc-internals. This provides WebRTC stats and debugging data

— https://codelabs.developers.google.com/codelabs/webrtc-web

shim adapter

script src="https://webrtc.github.io/adapter/

— https://codelabs.developers.google.com/codelabs/webrtc-web

adapter.js is a shim to insulate apps from spec changes and prefix differences. (Though in fact, the standards and protocols used for WebRTC implementations are highly stable, and there are only a few prefixed names.)

— https://codelabs.developers.google.com/codelabs/webrtc-web

Security

Encryption is mandatory for all WebRTC components, and its JavaScript APIs can only be used from secure origins (HTTPS or localhost).

— https://codelabs.developers.google.com/codelabs/webrtc-web

Sending Data

RTCDataChannel

syntax of RTCDataChannel is deliberately similar to WebSocket, with a send() method and a message event.

— https://codelabs.developers.google.com/codelabs/webrtc-web

Communication path

WebRTC is designed to work peer-to-peer, so users can connect by the most direct route possible

— https://codelabs.developers.google.com/codelabs/webrtc-web

Once RTCPeerConnection has that information, the ICE magic happens automatically. RTCPeerConnection uses the ICE framework to work out the best path between peers, working with STUN and TURN servers as necessary

— https://web.dev/articles/webrtc-infrastructure

RTCPeerConnection

RTCPeerConnection is the API used by WebRTC apps to create a connection between peers, and communicate audio and video.

— https://web.dev/articles/webrtc-infrastructure

NAT traversal

STUN: to find an IP that crosses the NAT
TURN: to relay data if STUN did not find an IP to cross the NAT

WebRTC APIs use STUN servers to get the IP address of your computer, and TURN servers to function as relay servers in case peer-to-peer communication fails

— https://codelabs.developers.google.com/codelabs/webrtc-web

In other words, a STUN server is used to get an external network address and TURN servers are used to relay traffic if direct (peer-to-peer) connection fails.

— https://web.dev/articles/webrtc-infrastructure

Session Traversal Utilities for NAT

External reference: https://blog.ivrpowers.com/post/technologies/what-is-stun-turn-server/ app uses a STUN server to discover its IP:port from a public perspective. This process enables a WebRTC peer to get a publicly accessible address for itself and then pass it to another peer through a signaling mechanism in order to set up a direct link. (In practice, different NATs work in different ways and there may be multiple NAT layers, but the principle is still the same.)

— https://web.dev/articles/webrtc-infrastructure

STUN servers live on the public internet and have one simple task - check the IP:port address of an incoming request (from an app running behind a NAT) and send that address back as a response

— https://web.dev/articles/webrtc-infrastructure

Sometimes, you can use a protocol called STUN (Session Traversal Utilities for NAT) that allows clients to discover their public IP address and the type of NAT they are behind. This information is used to establish the media connection. In most cases, a STUN server is only used during the connection setup and once that session has been established, media will flow directly between the peer and the Video Gateway (WebRTC).

— https://blog.ivrpowers.com/post/technologies/what-is-stun-turn-server/

However, even if we setup properly a STUN server, there are very restrictive corporate networks (e.g: UDP traffic forbidden, only 443 TCP allowed…), which will require clients to use a TURN (Traversal Using Relays around NAT) server to relay traffic if direct (peer to Video Gateway) connection fails. In these cases, you can install our TURN server (in another instance) to solve these issues

— https://blog.ivrpowers.com/post/technologies/what-is-stun-turn-server/

Traversal Using Relays around NAT

Every TURN server supports STUN. A TURN server is a STUN server with additional built-in relaying functionality.

— https://web.dev/articles/webrtc-infrastructure

TURN is used to relay audio, video, and data streaming between peers, not signaling data!

— https://web.dev/articles/webrtc-infrastructure

Interactive Connectivity Establishment

expression ‘finding candidates’ refers to the process of finding network interfaces and ports using the ICE framework.

— https://codelabs.developers.google.com/codelabs/webrtc-web

Interactive Connectivity Establishment (ICE) is a technique used in computer networking to find ways for two computers to talk to each other as directly as possible in peer-to-peer networking

— https://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment

Alice and Eve also need to exchange network information. The expression “finding candidates” refers to the process of finding network interfaces and ports using the ICE framework.

Alice creates an RTCPeerConnection object with an onicecandidate handler.

The handler is called when network candidates become available.

In the handler, Alice sends stringified candidate data to Eve through their signaling channel.

When Eve gets a candidate message from Alice, she calls addIceCandidate() to add the candidate to the remote peer description

— https://web.dev/articles/webrtc-infrastructure

ICE tries to find the best path to connect peers. It tries all possibilities in parallel and chooses the most efficient option that works. ICE first tries to make a connection using the host address obtained from a device’s operating system and network card. If that fails (which it will for devices behind NATs), ICE obtains an external address using a STUN server and, if that fails, traffic is routed through a TURN relay server

— https://web.dev/articles/webrtc-infrastructure

JavaScript Session Establishment Protocol

External reference: https://rtcweb-wg.github.io/jsep/

JSEP requires the exchange between peers of offer and answer, the media metadata mentioned above. Offers and answers are communicated in Session Description Protocol (SDP) format

— https://web.dev/articles/webrtc-infrastructure

this document describes the JavaScript Session Establishment Protocol (JSEP) that allows for full control of the signaling state machine from JavaScript

— https://rtcweb-wg.github.io/jsep/

JSEP assumes a model in which a JavaScript application executes inside a runtime containing WebRTC APIs

— https://rtcweb-wg.github.io/jsep/

Session Description Protocol

Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation.[

— https://en.wikipedia.org/wiki/Session_Description_Protocol

gory detail:

Alice creates an RTCPeerConnection object.

Alice creates an offer (an SDP session description) with the RTCPeerConnection createOffer() method.

Alice calls setLocalDescription() with her offer.

Alice stringifies the offer and uses a signaling mechanism to send it to Eve.

Eve calls setRemoteDescription() with Alice’s offer, so that her RTCPeerConnection knows about Alice’s setup.

Eve calls createAnswer() and the success callback for this is passed a local session description - Eve’s answer.

Eve sets her answer as the local description by calling setLocalDescription().

Eve then uses the signaling mechanism to send her stringified answer to Alice

— https://web.dev/articles/webrtc-infrastructure

Alice sets Eve’s answer as the remote session description using setRemoteDescription().

— https://web.dev/articles/webrtc-infrastructure

Signaling is a process used in WebRTC to detect peers; exchange session descriptions to setup media ports; and helps share everything used for initial handshake.

— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

Approximately all WebRTC experiments rely on channels. “Channel” is a term used in realtime protocols like WebSocket to make sure data is transmitted privately over (100%) relevant clients.

— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

Channels are created dynamically for each peer; to make sure SDP/ICE is exchanged among relevant users.

— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

What about LAN or intranet?

You always need a signaling gateway; whether it installed publicly or privately. A gateway can be a copy/paste mechanism or a realtime protocol.

— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

Signaling is used to detect peers; and exchange prerequisites to setup media connections.

— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

ICE which is stands for interactive connectivity establishment is a protocol used to capture public IP addresses of the user. It let us know:

Public IP addresses of the user It is ipv4 or ipv6 UDP is blocked or not; otherwise fallback to TCP; otherwise fallback to custom protocol

— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html

signaling

WebRTC uses RTCPeerConnection to communicate streaming data between browsers, but also needs a mechanism to coordinate communication and to send control messages, a process known as signaling

— https://codelabs.developers.google.com/codelabs/webrtc-web

Signaling methods and protocols are not specified by WebRTC

— https://codelabs.developers.google.com/codelabs/webrtc-web

If you don’t want to roll your own, there are several WebRTC signaling servers available

— https://web.dev/articles/webrtc-infrastructure

Whatever you choose, you need an intermediary server to exchange signaling messages and app data between clients.

— https://web.dev/articles/webrtc-infrastructure

message service for signaling needs to be bidirectional: client to server and server to client

— https://web.dev/articles/webrtc-infrastructure

the EventSource API has been widely implemented. This enables server-sent events - data sent from a web server to a browser client through HTTP.

— https://web.dev/articles/webrtc-infrastructure

WebSocket is a more-natural solution, designed for full duplex client–server communication

— https://web.dev/articles/webrtc-infrastructure

also possible to handle signaling by getting WebRTC clients to poll a messaging server repeatedly through Ajax, but that leads to a lot of redundant network requests, which is especially problematic for mobile devices

— https://web.dev/articles/webrtc-infrastructure

design of Socket.io makes it simple to build a service to exchange messages and Socket.io is particularly suited to WebRTC signaling because of its built-in concept of rooms

— https://web.dev/articles/webrtc-infrastructure

signaling protocols and mechanisms are not defined by WebRTC standards

— https://web.dev/articles/webrtc-infrastructure

Signaling is the process of coordinating communication

— https://web.dev/articles/webrtc-infrastructure

exchange the following information:

Session-control messages used to open or close communication

Error messages

Media metadata, such as codecs, codec settings, bandwidth, and media types

Key data used to establish secure connections

Network data, such as a host’s IP address and port as seen by the outside world

— https://web.dev/articles/webrtc-infrastructure

Konubinix' opinionated web of thoughts

Webrtc

topologies

peer-to-peer

Multi-point Control Unit

Selective Forwarding Unit

more than two peers

selective forwarding unit

peerjs

debug

shim adapter

Security

Sending Data

RTCDataChannel

Communication path

RTCPeerConnection

NAT traversal

Session Traversal Utilities for NAT

Traversal Using Relays around NAT

Interactive Connectivity Establishment

JavaScript Session Establishment Protocol

Session Description Protocol

signaling

Notes linking here

Permalink