Learning Question

Why does large-scale video playback usually use HTTP segments instead of WebSocket?

Video playback can feel like one continuous stream, but modern internet video delivery commonly uses many HTTP requests for metadata, manifests, and time-based media segments. The player buffers, decodes, and plays those segments while requesting more.

Segment-Based Delivery

A video can be divided into time-based segments:

segment001 -> 0-4 seconds
segment002 -> 4-8 seconds
segment003 -> 8-12 seconds

The client requests the needed segment objects over HTTP. Playback and downloading overlap, so the viewer does not need the full video file before playback starts.

HLS and DASH

HLS and MPEG-DASH define media organization above HTTP.

HLS or DASH
-> playlists, manifests, variants, segments
 
HTTP
-> request and response delivery of those objects
 
TCP, TLS, QUIC, IP
-> lower transport and routing

They are not replacements for HTTP. They define how media is packaged and selected so clients can request it over HTTP.

Adaptive Bitrate

Segment-based delivery lets the client choose future quality based on network speed and buffer state.

segment001 -> 1080p
segment002 -> 1080p
segment003 -> 480p
segment004 -> 480p

The player does not have to commit to one quality level for the whole video.

CDN Fit

HLS and DASH fit CDNs because media becomes URL-addressable HTTP objects:

/video/720p/segment105.ts
/video/720p/segment106.ts
/video/480p/segment105.ts

Many viewers can request the same segment URL. A CDN edge can cache that object and serve it repeatedly.

WebSocket can be proxied by supported infrastructure, but WebSocket messages are usually not ordinary cacheable HTTP objects with stable URLs, methods, responses, and cache headers.

Core Mental Model

Large-scale video playback usually needs:

buffering
quality switching
CDN-cacheable media objects
many clients sharing the same segment responses

Those needs fit HLS or DASH over HTTP better than one opaque bidirectional WebSocket message channel.