HomePHP
PHP

The Handshake That Happens Before You Say Hello: How WebRTC Builds Direct Browser-to-Browser Connections

S
Staff Writer | Contributing Writer | Jul 3, 2026 | 10 min read ✓ Reviewed

You click a video call link, your browser asks for camera access, and within seconds you're looking at someone on the other side of the world. It feels instant, like flipping a light switch. But before a single frame of video travels, your browser and theirs have to solve a genuinely difficult networking problem: how do two computers on the open internet, each hidden behind routers and firewalls they don't control, find each other and open a direct channel?

That problem is what WebRTC is built to solve — and the solution involves more moving parts than most people expect.

What WebRTC Actually Is

WebRTC stands for Web Real-Time Communication. It's a set of browser APIs and network protocols that let audio, video, and data travel directly between two browsers — without the media passing through a company's central server. That "peer-to-peer" design is the whole point. It keeps latency low, reduces server costs, and means your raw video stream isn't being processed by a third party in the middle.

WebRTC was open-sourced by Google in 2011 and became a W3C and IETF standard, with broad browser support across Chrome, Firefox, Safari, and Edge. That standardization matters: a Chrome user and a Firefox user can have a video call through a WebRTC app without either browser doing anything special. The protocol handles the differences underneath.

Why Peer-to-Peer Is Harder Than It Sounds

Here's the problem. Your laptop doesn't have a public IP address that the whole internet can reach directly. It sits behind your home router, which itself sits behind your internet provider's network infrastructure. This layering is called NAT — Network Address Translation — and it exists for good reasons (mainly because public IP addresses are scarce), but it creates a real headache when two devices want to talk directly.

Think of it like two people who each live in a large apartment building with no listed unit numbers. They each know the building's street address (the router's public IP), but not how to get mail to the specific apartment (the device's private IP inside the network). If neither person knows the other's unit number, they can't deliver anything directly.

On top of NAT, firewalls at companies, universities, and internet providers may block incoming connections entirely. So establishing a true peer-to-peer connection means navigating a maze of network boundaries neither peer controls — and doing it in real time, before the call starts.

The Solution: ICE, STUN, and TURN

WebRTC uses ICE (Interactive Connectivity Establishment) to discover a viable network path between two peers, combining STUN and TURN protocols to traverse NAT and firewalls. ICE is essentially a systematic process of gathering every possible way the two peers might reach each other, and then testing those options in order of preference until one works.

STUN: Figuring Out Your Public Face

The first step is self-discovery. Your browser needs to know what address it looks like to the outside world — not the private address your router assigned it, but the public IP address that the internet sees.

STUN (Session Traversal Utilities for NAT) servers let a browser discover its public IP address without routing media through a central server. A STUN server is a simple, lightweight service. Your browser sends it a quick message, and it replies: "You look like IP address 203.0.113.45, port 54321 from where I'm standing." Your browser now has a candidate address to share with the other peer. STUN servers handle very little traffic — just those quick discovery exchanges — which is why they can be run cheaply at scale.

ICE collects multiple candidates: your private local address, the address discovered via STUN, and possibly others. Both peers share their candidate lists with each other and then try connecting through each combination, starting with the most direct options.

TURN: The Fallback Relay

Sometimes the NAT or firewall situation is simply too restrictive. Certain network configurations block all incoming connections regardless of the public IP workaround. In those cases, a direct path genuinely doesn't exist.

That's where TURN comes in. TURN servers relay media when a direct path cannot be found. When ICE exhausts all direct options and none work, it falls back to routing media through a TURN server. Unlike STUN, a TURN server actually carries the video and audio traffic — so it needs real bandwidth and costs more to run. It's the last resort, not the default. But it's what ensures a call can still happen even in hostile network conditions.

The key insight: even when a TURN relay is used, it's only carrying the stream because no direct path was possible. The goal is always to find the most direct route, and TURN is the safety net when that fails.

The Signaling Problem: How Do the Peers Even Find Each Other?

ICE figures out how to connect, but there's a prior question: how do the two browsers know to connect in the first place? Before ICE can run, the peers need to exchange metadata — who they are, what codecs they support, what network candidates they have. This exchange is called signaling.

Here's something surprising: a signaling channel — which WebRTC deliberately leaves unspecified — must exchange SDP (Session Description Protocol) offers and answers before a peer connection can be established; developers typically implement this with WebSockets or HTTP.

WebRTC doesn't tell you how to do signaling. It only tells you what information needs to be exchanged. The reason is pragmatic: different applications have wildly different needs, and forcing one signaling mechanism would make WebRTC inflexible. So developers are free to use WebSockets, regular HTTP requests, email, carrier pigeon — whatever reliably gets the SDP from one peer to the other.

What Is SDP?

SDP stands for Session Description Protocol. It's a text-based format — essentially a structured description of a peer's media capabilities and network information. One peer generates an "offer" (saying, in effect, "here's what I can do and here's how to reach me"), and the other generates an "answer" ("here's what I can do in response"). Once both sides have exchanged offers and answers, the ICE process can begin and the connection gets established.

The signaling server's job ends there. Once the SDP has been exchanged and a peer-to-peer path found, the signaling server steps aside. It never touches the actual media.

Choosing a Language to Speak: Codec Negotiation

Even after two peers find each other, they need to agree on how they'll compress and decompress audio and video. These compression formats are called codecs. Chrome might support a different set of codecs than Safari, for instance, and they need to find common ground.

This negotiation happens inside the SDP exchange. WebRTC's Opus audio codec and VP8/VP9/H.264 video codecs are commonly negotiated during the SDP handshake, with the specific codec chosen based on what both peers support. Opus is particularly well-regarded for audio because it handles everything from music to voice calls efficiently and adapts to changing network conditions. For video, the specific codec chosen depends on what both browsers have available — the SDP offer-and-answer process is essentially the two peers comparing lists and picking what they share.

The Developer's Entry Point: RTCPeerConnection

If you're a web developer building on WebRTC, you mostly don't interact with ICE, STUN, TURN, and SDP directly at a low level. There's an API that wraps all of it.

The RTCPeerConnection API, the core JavaScript interface for WebRTC, manages the full ICE negotiation, codec selection, and media track handling exposed to web developers. You configure it with your STUN and TURN server addresses, attach media streams to it, and handle the SDP offers and answers. The ICE negotiation runs in the background. When a working connection path is found, the API surfaces it and media starts flowing.

This abstraction is what makes WebRTC usable in a browser environment. The underlying network machinery is complex, but most of its complexity is managed internally by the browser's WebRTC implementation.

Security Is Mandatory, Not Optional

One of WebRTC's most important properties is that encryption isn't a feature you turn on — it's built into the specification and can't be turned off.

WebRTC encrypts all media streams using DTLS (Datagram Transport Layer Security) and SRTP (Secure Real-time Transport Protocol), making encryption mandatory by specification. DTLS handles the key exchange — the cryptographic handshake that establishes a shared secret between peers. SRTP then uses that secret to encrypt the actual audio and video packets in transit. Even if media is relayed through a TURN server, it remains encrypted end-to-end between the two browser peers; the relay server can't see inside the packets.

This matters because real-time media is inherently sensitive. Mandatory encryption means developers can't accidentally ship an unencrypted video call, and users don't have to check a setting or trust that the developer remembered to enable security.

Putting It All Together: The Full Sequence

Here's the complete picture of what happens in the time between you clicking "join call" and your face appearing on someone else's screen:

  1. Signaling begins. Your browser connects to a signaling server (via WebSocket or similar) and sends an SDP offer describing your media capabilities and initial network candidates.
  2. The offer reaches the other peer through the signaling server, which acts only as a message relay.
  3. ICE candidate gathering runs on both sides. Each browser contacts STUN servers to discover its public-facing address and collects all possible candidate paths.
  4. Candidates are exchanged through the signaling channel, added to each side's RTCPeerConnection.
  5. ICE tests candidate pairs in order of preference — local-to-local first, then STUN-discovered paths, then TURN relay as a last resort.
  6. A working path is found. The DTLS handshake runs over that path, establishing encrypted keys.
  7. Media flows. Audio and video packets travel peer-to-peer (or via TURN relay), encrypted with SRTP, using the negotiated codecs.

The signaling server's role ends at step 6. Everything after that is between the two browsers.

Why This Architecture Matters

The peer-to-peer design has real consequences beyond just being technically elegant. When media flows directly between browsers, latency drops because packets don't take a detour through a central server. The service provider doesn't bear the bandwidth cost of carrying every video stream — only the lightweight signaling traffic. And because encryption is mandatory and end-to-end, the provider's infrastructure never has access to the raw media content.

That said, pure peer-to-peer has limits. Group calls with many participants can overwhelm a single browser trying to send separate streams to every other participant simultaneously. That's why many video conferencing services use a server-side media router (called an SFU — Selective Forwarding Unit) even for WebRTC calls, letting each browser send one stream up and receive routed streams down. But the WebRTC protocols still handle the encryption and codec negotiation; only the routing topology changes.

The Elegance Underneath the Complexity

WebRTC is a good example of a system that looks simple from the outside — click, connect, talk — because a lot of careful engineering is hidden underneath. The separation between signaling (deliberately unspecified) and media transport (tightly specified), the layered fallback from direct connection to STUN-assisted to TURN-relayed, the mandatory encryption, the codec negotiation through SDP: each piece solves a real problem that couldn't just be wished away.

The next time a video call connects in seconds, you'll know that what felt instant was actually a fast, automated negotiation across network boundaries that most software doesn't cross gracefully — a handshake that had to happen before you said hello.

Sources

Every factual claim in this article was independently verified against the following sources:

PHP how WebRTC works peer-to-peer connection
S
Staff Writer

Contributing Writer at UMI Groups

Related Articles