Webrtc
Fleeting- External reference: https://medium.com/secure-meeting/webrtc-architecture-basics-p2p-sfu-mcu-and-hybrid-approaches-e2aea14c80f9
- External reference: https://codelabs.developers.google.com/codelabs/webrtc-web
- External reference: https://en.wikipedia.org/wiki/Session_Description_Protocol
- External reference: https://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment
- External reference: https://web.dev/articles/webrtc-infrastructure
- External reference: https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
Setting up a call between WebRTC peers involves three tasks:
- Create a RTCPeerConnection for each end of the call and, at each end, add the local stream from getUserMedia().
- Get and share network information: potential connection endpoints are known as ICE candidates.
- Get and share local and remote descriptions: metadata about local media in SDP format.
— https://codelabs.developers.google.com/codelabs/webrtc-web
H.264 is a format that is more universally supported across different browsers
AAC finds more universal adoption
foundational protocol that powers WebRTC is UDP.
DTLS is used within WebRTC to secure and encrypt all data transfers between participants.
SCTP and SRTP are used to multiplex the streams and provide both congestion and flow control.
every stream sent between participants is encrypted with Secure Real-Time Protocol (SRTP)
generate the keys to encrypt the session, WebRTC utilizes DTLS-SRTP
signaling server needs to be using the HTTPS protocol, which encrypts the contents sent across the signaling server
WebRTC does not handle signaling
job of the MCU is to receive media from each participant, decode it, and mix the audio and video from the participants together into a single stream and send it to each participant
Each node sends it’s transcoded media to the SFU, which then forwards this to all the nodes in the session. Unlike the MCU approach, transcoding happens at the edges and not at the server.
SFU based approaches tend to scale very well, while keeping the server load to a minimum
simple strategy is to use P2P mode when participants are less than 4, and switch to SFUs when it crosses that threshold. Most open-source SFUs today can scale well to about 20–30 participants.
topologies
peer-to-peer
Multi-point Control Unit
Selective Forwarding Unit
more than two peers
-
External reference: https://voximplant.com/blog/an-introduction-to-selective-forwarding-units Blog | Voximplant.com
what if you want to hold a meeting with more than two people? How can you leverage powerful WebRTC APIs to build a multi party conferencing application?
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
selective forwarding unit
selective forwarding unit (SFU) as the preferred method of extending WebRTC to multi party conferencing
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
approach is to build a mesh topology in which every participant sends and receives media from every other participant
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
mesh topology quickly reaches scalability limitations. It consumes a lot of bandwidth and client processing power to manage all the media streams.
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
To increase scalability, you need to build a hub-and-spoke topology by inserting a media server into the network
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
Hub-and-spoke topologies can increase latency because media must traverse a longer path from sender to receiver
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
hub-and-spoke topology introduces an intermediary between clients that breaks the WebRTC end-to-end security feature.
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
Without the incorporation of additional end-to-end encryption (E2EE) techniques, a bad actor could potentially monitor the media as it passes through the media server
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
Two types of media servers can be used to implement the hub-and-spoke topology: The multipoint control unit (MCU); and the selective forwarding unit (SFU).
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
MCU decodes each received media stream, rescales them, creates a new tiled stream featuring all participants, encodes and sends it to all clients.
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
MCU is an expensive and compute-intensive infrastructure element
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
MCU completely relieves clients of local processing and it is the most efficient of all the alternatives in its bandwidth utilization.
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
SFU receives media streams from each participant and merely forwards them to the other participants without changes
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
, in an SFU architecture, the client with the least bandwidth dictates the video quality available to all clients
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
Simulcast SFU
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
designed to prevent a few clients with limited bandwidth resources from degrading the video quality available to all
— https://voximplant.com/blog/an-introduction-to-selective-forwarding-units
peerjs
-
External reference: https://peerjs.com/
PeerJS wraps the browser’s WebRTC implementation to provide a complete, configurable, and easy-to-use peer-to-peer connection API. Equipped with nothing but an ID, a peer can create a P2P data or media stream connection to a remote peer.
They even provide the code for a simple signaling server in https://github.com/peers/peerjs-server
Beware though, that it needs a recent enough browser.
Browser support
Firefox Chrome Edge Safari 80+ 83+ 83+ 15+ We test PeerJS against these versions of Chrome, Edge, Firefox, and Safari with BrowserStack to ensure compatibility. It may work in other and older browsers, but we don’t officially support them. Changes to browser support will be a breaking change going forward.
debug
Take a look at chrome://webrtc-internals. This provides WebRTC stats and debugging data
— https://codelabs.developers.google.com/codelabs/webrtc-web
shim adapter
script src="https://webrtc.github.io/adapter/
— https://codelabs.developers.google.com/codelabs/webrtc-web
adapter.js is a shim to insulate apps from spec changes and prefix differences. (Though in fact, the standards and protocols used for WebRTC implementations are highly stable, and there are only a few prefixed names.)
— https://codelabs.developers.google.com/codelabs/webrtc-web
Security
Encryption is mandatory for all WebRTC components, and its JavaScript APIs can only be used from secure origins (HTTPS or localhost).
— https://codelabs.developers.google.com/codelabs/webrtc-web
Sending Data
RTCDataChannel
syntax of RTCDataChannel is deliberately similar to WebSocket, with a send() method and a message event.
— https://codelabs.developers.google.com/codelabs/webrtc-web
Communication path
WebRTC is designed to work peer-to-peer, so users can connect by the most direct route possible
— https://codelabs.developers.google.com/codelabs/webrtc-web
Once RTCPeerConnection has that information, the ICE magic happens automatically. RTCPeerConnection uses the ICE framework to work out the best path between peers, working with STUN and TURN servers as necessary
RTCPeerConnection
RTCPeerConnection is the API used by WebRTC apps to create a connection between peers, and communicate audio and video.
NAT traversal
- STUN
- to find an IP that crosses the NAT
- TURN
- to relay data if STUN did not find an IP to cross the NAT
WebRTC APIs use STUN servers to get the IP address of your computer, and TURN servers to function as relay servers in case peer-to-peer communication fails
— https://codelabs.developers.google.com/codelabs/webrtc-web
In other words, a STUN server is used to get an external network address and TURN servers are used to relay traffic if direct (peer-to-peer) connection fails.
Session Traversal Utilities for NAT
External reference: https://blog.ivrpowers.com/post/technologies/what-is-stun-turn-server/ app uses a STUN server to discover its IP:port from a public perspective. This process enables a WebRTC peer to get a publicly accessible address for itself and then pass it to another peer through a signaling mechanism in order to set up a direct link. (In practice, different NATs work in different ways and there may be multiple NAT layers, but the principle is still the same.)
STUN servers live on the public internet and have one simple task - check the IP:port address of an incoming request (from an app running behind a NAT) and send that address back as a response
Sometimes, you can use a protocol called STUN (Session Traversal Utilities for NAT) that allows clients to discover their public IP address and the type of NAT they are behind. This information is used to establish the media connection. In most cases, a STUN server is only used during the connection setup and once that session has been established, media will flow directly between the peer and the Video Gateway (WebRTC).
— https://blog.ivrpowers.com/post/technologies/what-is-stun-turn-server/
However, even if we setup properly a STUN server, there are very restrictive corporate networks (e.g: UDP traffic forbidden, only 443 TCP allowed…), which will require clients to use a TURN (Traversal Using Relays around NAT) server to relay traffic if direct (peer to Video Gateway) connection fails. In these cases, you can install our TURN server (in another instance) to solve these issues
— https://blog.ivrpowers.com/post/technologies/what-is-stun-turn-server/
Traversal Using Relays around NAT
Every TURN server supports STUN. A TURN server is a STUN server with additional built-in relaying functionality.
TURN is used to relay audio, video, and data streaming between peers, not signaling data!
Interactive Connectivity Establishment
expression ‘finding candidates’ refers to the process of finding network interfaces and ports using the ICE framework.
— https://codelabs.developers.google.com/codelabs/webrtc-web
Interactive Connectivity Establishment (ICE) is a technique used in computer networking to find ways for two computers to talk to each other as directly as possible in peer-to-peer networking
— https://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment
Alice and Eve also need to exchange network information. The expression “finding candidates” refers to the process of finding network interfaces and ports using the ICE framework.
- Alice creates an RTCPeerConnection object with an onicecandidate handler.
- The handler is called when network candidates become available.
- In the handler, Alice sends stringified candidate data to Eve through their signaling channel.
- When Eve gets a candidate message from Alice, she calls addIceCandidate() to add the candidate to the remote peer description
ICE tries to find the best path to connect peers. It tries all possibilities in parallel and chooses the most efficient option that works. ICE first tries to make a connection using the host address obtained from a device’s operating system and network card. If that fails (which it will for devices behind NATs), ICE obtains an external address using a STUN server and, if that fails, traffic is routed through a TURN relay server
JavaScript Session Establishment Protocol
- External reference: https://rtcweb-wg.github.io/jsep/
JSEP requires the exchange between peers of offer and answer, the media metadata mentioned above. Offers and answers are communicated in Session Description Protocol (SDP) format
this document describes the JavaScript Session Establishment Protocol (JSEP) that allows for full control of the signaling state machine from JavaScript
JSEP assumes a model in which a JavaScript application executes inside a runtime containing WebRTC APIs
Session Description Protocol
Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation.[
— https://en.wikipedia.org/wiki/Session_Description_Protocol
gory detail:
- Alice creates an RTCPeerConnection object.
- Alice creates an offer (an SDP session description) with the RTCPeerConnection createOffer() method.
- Alice calls setLocalDescription() with her offer.
- Alice stringifies the offer and uses a signaling mechanism to send it to Eve.
- Eve calls setRemoteDescription() with Alice’s offer, so that her RTCPeerConnection knows about Alice’s setup.
- Eve calls createAnswer() and the success callback for this is passed a local session description - Eve’s answer.
- Eve sets her answer as the local description by calling setLocalDescription().
- Eve then uses the signaling mechanism to send her stringified answer to Alice
Alice sets Eve’s answer as the remote session description using setRemoteDescription().
Signaling is a process used in WebRTC to detect peers; exchange session descriptions to setup media ports; and helps share everything used for initial handshake.
— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
Approximately all WebRTC experiments rely on channels. “Channel” is a term used in realtime protocols like WebSocket to make sure data is transmitted privately over (100%) relevant clients.
— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
Channels are created dynamically for each peer; to make sure SDP/ICE is exchanged among relevant users.
— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
What about LAN or intranet?
You always need a signaling gateway; whether it installed publicly or privately. A gateway can be a copy/paste mechanism or a realtime protocol.
— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
Signaling is used to detect peers; and exchange prerequisites to setup media connections.
— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
ICE which is stands for interactive connectivity establishment is a protocol used to capture public IP addresses of the user. It let us know:
Public IP addresses of the user It is ipv4 or ipv6 UDP is blocked or not; otherwise fallback to TCP; otherwise fallback to custom protocol
— https://www.webrtc-experiment.com/docs/WebRTC-Signaling-Concepts.html
signaling
WebRTC uses RTCPeerConnection to communicate streaming data between browsers, but also needs a mechanism to coordinate communication and to send control messages, a process known as signaling
— https://codelabs.developers.google.com/codelabs/webrtc-web
Signaling methods and protocols are not specified by WebRTC
— https://codelabs.developers.google.com/codelabs/webrtc-web
If you don’t want to roll your own, there are several WebRTC signaling servers available
Whatever you choose, you need an intermediary server to exchange signaling messages and app data between clients.
message service for signaling needs to be bidirectional: client to server and server to client
the EventSource API has been widely implemented. This enables server-sent events - data sent from a web server to a browser client through HTTP.
WebSocket is a more-natural solution, designed for full duplex client–server communication
also possible to handle signaling by getting WebRTC clients to poll a messaging server repeatedly through Ajax, but that leads to a lot of redundant network requests, which is especially problematic for mobile devices
design of Socket.io makes it simple to build a service to exchange messages and Socket.io is particularly suited to WebRTC signaling because of its built-in concept of rooms
signaling protocols and mechanisms are not defined by WebRTC standards
Signaling is the process of coordinating communication
exchange the following information:
- Session-control messages used to open or close communication
- Error messages
- Media metadata, such as codecs, codec settings, bandwidth, and media types
- Key data used to establish secure connections
- Network data, such as a host’s IP address and port as seen by the outside world
Notes linking here
- aiortc
- aiortc to create a remote web camera with an android phone
- go2rtc to use an android device as webcam
- i9300
- Socket.IO
- steveseguin/websocket_server
- stopmotion android apps
- use the camera with kivy on android (blog)
- VDO.Ninja