Building a Real-Time WebRTC to HLS Streaming Platform: From P2P Video Calls to Live Broadcasting

github

I recently built a full-stack application that bridges this gap, creating a system where users can have WebRTC video calls while simultaneously broadcasting to HLS viewers. Here's the technical journey and key learnings from this project.

The Challenge: Two Worlds of Video Streaming

The project had an interesting dual requirement:

WebRTC side: Enable real-time, low-latency video calls between participants (think Google Meet)
HLS side: Allow viewers to watch the ongoing conversation as a live stream (think YouTube Live)

This presents a fascinating technical challenge because WebRTC and HLS serve different purposes:

WebRTC: Ultra-low latency (sub-second), P2P connections, perfect for interactive communication
HLS: Higher latency (3-10 seconds), CDN-friendly, scalable to millions of viewers

Architecture Overview

The solution involved several key components working in harmony:

Frontend (Next.js + TypeScript)

Stream Page (/stream): WebRTC participants with camera/microphone access
Watch Page (/watch): HLS viewers consuming the live stream
Real-time Communication: Socket.io for signaling and coordination

Backend (Node.js + TypeScript)

Mediasoup: SFU (Selective Forwarding Unit) for WebRTC media routing
FFmpeg: Media transcoding from WebRTC to HLS format
Socket.io: WebSocket management for real-time signaling
Express: HTTP server for HLS segment delivery

The Technical Deep Dive

1. WebRTC with Mediasoup SFU

Instead of direct P2P connections, I used Mediasoup as an SFU. This architecture offers several advantages:

// Creating WebRTC transports for media exchange
socket.on('createWebRtcTransport', async ({ sender }, callback) => {
    const transport = await router.createWebRtcTransport({
        listenIps: [{ ip: '0.0.0.0', announcedIp: '192.168.1.38' }],
        enableUdp: true,
        enableTcp: true,
        iceServers: [{ urls: 'stun:stun.l.google.com:19302' }],
    });
    // Transport configuration and callback...
});

The SFU approach means:

Each participant sends their media once to the server
The server forwards streams to other participants
Much more scalable than full-mesh P2P for multiple participants
Enables server-side processing (crucial for our HLS conversion)

2. The WebRTC to HLS Bridge

The most technically challenging part was converting real-time WebRTC streams to HLS format. Here's how it works:

Step 1: Extract RTP Streams

// Create plain transport to receive RTP from WebRTC
const transport = await router.createPlainTransport({
    listenIp: '127.0.0.1',
    rtcpMux: false,
    comedia: false,
    enableSrtp: false,
});

// Consume WebRTC producer and forward to plain transport
const consumer = await transport.consume({
    producerId: videoProducer.producer.id,
    rtpCapabilities: router.rtpCapabilities,
    paused: false
});

Step 2: Generate SDP for FFmpeg

// Create SDP file describing the RTP streams
const sdpString = `v=0
o=- 0 0 IN IP4 127.0.0.1
s=FFMPEG
c=IN IP4 127.0.0.1
t=0 0
${sdpMedia}`;

Step 3: FFmpeg Transcoding Pipeline

const ffmpegArgs = [
    '-f', 'sdp', '-i', sdpFilePath,
    '-filter_complex', filterComplex,
    '-c:v', 'libx264',
    '-preset', 'ultrafast',
    '-tune', 'zerolatency',
    '-f', 'hls',
    '-hls_time', '1',
    '-hls_list_size', '3',
    '-hls_flags', 'delete_segments+round_durations+independent_segments',
    outputPath
];

3. Handling Multiple Video Streams

One interesting challenge was compositing multiple video streams for HLS output:

// Single participant: simple scaling
if (videoProducers.length === 1) {
    filterComplex += '[0:v:0]scale=1280:720[vout];';
} 
// Multiple participants: side-by-side layout
else if (videoProducers.length === 2) {
    filterComplex += '[0:v:0]scale=640:720[v0];';
    filterComplex += '[0:v:1]scale=640:720[v1];';
    filterComplex += '[v0][v1]hstack=inputs=2[vout];';
}

Key Technical Challenges and Solutions

1. Codec Compatibility

Problem: WebRTC typically uses VP8/VP9, while HLS prefers H.264. Solution: Real-time transcoding with FFmpeg, optimized for low-latency with ultrafast preset and zerolatency tuning.

2. Synchronization Issues

Problem: Audio and video streams could drift out of sync during transcoding. Solution: Careful RTP timestamp handling and periodic keyframe requests:

const keyFrameInterval = setInterval(() => {
    consumers.forEach((consumer) => {
        if (consumer && consumer.kind === 'video' && !consumer.closed) {
            consumer.requestKeyFrame();
        }
    });
}, 4000);

3. Latency Optimization

Problem: Each step in the pipeline adds latency. Solution:

1-second HLS segments (vs typical 4-6 seconds)
Aggressive FFmpeg settings for minimal buffering
Direct RTP forwarding without unnecessary re-encoding

4. Resource Management

Problem: FFmpeg processes and media transports need proper cleanup. Solution: Comprehensive cleanup logic on client disconnect:

socket.on('disconnect', () => {
    // Clean up producers, consumers, transports
    // Kill FFmpeg processes
    // Remove temporary files
});

Development Experience and Learnings

The Good

Mediasoup: Excellent documentation and TypeScript support made WebRTC much more manageable
FFmpeg flexibility: The filter complex system is incredibly powerful for video composition
Next.js integration: Seamless development experience with both frontend and backend in one project

The Challenging

Debugging media issues: When streams don't work, it's often unclear if the issue is WebRTC, SFU, FFmpeg, or network-related
Platform differences: Media handling varies significantly between browsers and devices
Resource intensive: Multiple FFmpeg processes can quickly consume system resources

Development Setup

The project uses a clever npm script setup:

{
  "scripts": {
    "dev": "concurrently \"npm run server\" \"npm run next\"",
    "server": "ts-node server.ts",
    "next": "next dev"
  }
}

This allows both the Next.js frontend and Node.js backend to run simultaneously during development.

Performance Considerations

CPU Usage

FFmpeg transcoding is CPU-intensive. For production, consider:

Hardware-accelerated encoding (VAAPI, NVENC)
Multiple quality streams for adaptive bitrate
Load balancing across multiple servers

Memory Management

Proper cleanup of MediaStream objects
FFmpeg process monitoring
WebRTC connection state management

Network Optimization

STUN/TURN server configuration for WebRTC
CDN integration for HLS delivery
Bandwidth adaptation based on participant count

Future Enhancements

This foundation opens up many possibilities:

Multi-quality HLS streams for adaptive bitrate
Recording capabilities by extending the FFmpeg pipeline
Chat integration alongside video streams
Authentication and room management
Mobile app support with React Native

Conclusion

Building a WebRTC to HLS bridge taught me that modern video streaming is beautifully complex. The intersection of real-time communication protocols, media processing, and web technologies creates fascinating engineering challenges.

The key insight is that you don't need to choose between WebRTC and HLS, you can have both. WebRTC provides the interactive, low-latency experience for active participants, while HLS enables scalable broadcasting to passive viewers.

If you're interested in video streaming technology, I'd highly recommend diving into projects like this. The combination of Mediasoup, FFmpeg, and modern web frameworks provides a powerful toolkit for building next-generation streaming applications.

The complete source code for this project demonstrates practical implementations of WebRTC SFU architecture, real-time media processing, and HLS streaming—all tied together in a modern TypeScript/Next.js application.