Learning Question
Why does large-scale video playback usually use HTTP segments instead of WebSocket?
Video playback can feel like one continuous stream, but modern internet video delivery commonly uses many HTTP requests for metadata, manifests, and time-based media segments. The player buffers, decodes, and plays those segments while requesting more.
Segment-Based Delivery
A video can be divided into time-based segments:
segment001 -> 0-4 seconds
segment002 -> 4-8 seconds
segment003 -> 8-12 secondsThe client requests the needed segment objects over HTTP. Playback and downloading overlap, so the viewer does not need the full video file before playback starts.
HLS and DASH
HLS and MPEG-DASH define media organization above HTTP.
HLS or DASH
-> playlists, manifests, variants, segments
HTTP
-> request and response delivery of those objects
TCP, TLS, QUIC, IP
-> lower transport and routingThey are not replacements for HTTP. They define how media is packaged and selected so clients can request it over HTTP.
Adaptive Bitrate
Segment-based delivery lets the client choose future quality based on network speed and buffer state.
segment001 -> 1080p
segment002 -> 1080p
segment003 -> 480p
segment004 -> 480pThe player does not have to commit to one quality level for the whole video.
CDN Fit
HLS and DASH fit CDNs because media becomes URL-addressable HTTP objects:
/video/720p/segment105.ts
/video/720p/segment106.ts
/video/480p/segment105.tsMany viewers can request the same segment URL. A CDN edge can cache that object and serve it repeatedly.
WebSocket can be proxied by supported infrastructure, but WebSocket messages are usually not ordinary cacheable HTTP objects with stable URLs, methods, responses, and cache headers.
Core Mental Model
Large-scale video playback usually needs:
buffering
quality switching
CDN-cacheable media objects
many clients sharing the same segment responsesThose needs fit HLS or DASH over HTTP better than one opaque bidirectional WebSocket message channel.