Q1. How Does HTTP Deliver Data for Video Streaming?
HTTP delivers video streaming data by letting the client repeatedly request media metadata and small time-based media segments, while the player buffers, decodes, and plays the received data.
The important point is that playback and downloading overlap in time. The client does not usually wait for the entire video file to finish downloading before playback begins.
So HTTP video streaming is usually not one endless server push. It is usually a client-driven process where the player repeatedly fetches time-based media segments over persistent HTTP connections.
Q2. Is the video sent as one continuous HTTP response?
Usually no. Modern internet video streaming commonly divides the media into small time-based segments, such as a few seconds per segment.
For example:
segment001 -> 0-4 seconds
segment002 -> 4-8 seconds
segment003 -> 8-12 secondsThe viewer experiences one continuous video, but the client may internally request and receive many small media segments.
Q3. What does the client actually receive?
The client is not receiving finished screen images. It receives compressed and encoded video/audio data.
A simplified flow is:
compressed video/audio data
-> network transfer
-> client buffer
-> decoder
-> screen and speaker outputThe player turns the received media data into video frames and audio output.
Q4. Why does one video create many HTTP requests and responses?
A single video can involve many HTTP requests and responses because the client may request a playlist, a quality-specific manifest, initialization data, and many media segments.
For example, if each segment contains 4 seconds of media, a 10-minute video can require about 150 segment requests, not counting manifest and other supporting requests.
The key correction is that these requests are not usually made per video frame. They are usually made per media segment.
Q5. Does every HTTP request require a new handshake?
No. A handshake happens when a connection is created, not every time an HTTP request is sent.
A more accurate model is:
create TCP connection
-> TCP 3-way handshake
create HTTPS session
-> TLS handshake
send many HTTP requests and responses
-> reuse the existing connection when possibleSo many segment requests do not imply the same number of TCP handshakes.
The key distinction is that HTTP requests are repeated application-layer messages, while TCP, TLS, or QUIC handshakes belong to connection setup.
Q6. What changes with HTTP/1.1, HTTP/2, and HTTP/3?
With HTTP/1.1, persistent connections allow multiple requests and responses to reuse the same TCP connection.
With HTTP/2, multiple HTTP streams can share one TCP connection more efficiently.
With HTTP/3, HTTP runs over QUIC instead of TCP, so the connection setup is QUIC-based rather than a TCP 3-way handshake.
The general principle remains the same: HTTP requests are application-layer messages, while TCP, TLS, or QUIC handshakes belong to connection setup.
Q7. What are HLS and MPEG-DASH doing on top of HTTP?
Viewer-side video streaming commonly uses HTTP-based adaptive streaming, especially HLS or MPEG-DASH.
These are not replacements for HTTP in the usual sense. They define how media playlists, manifests, quality variants, and segments are organized so that the client can fetch them over HTTP.
A useful layering model is:
HLS / MPEG-DASH
-> streaming format and segment selection rules
HTTP/1.1, HTTP/2, or HTTP/3
-> request/response delivery of playlists and segments
TCP or QUIC
-> transport connection underneath HTTP
IP
-> packet routing between hostsHLS and MPEG-DASH sit above HTTP: they define the media structure, while HTTP delivers the playlist, manifest, and segment objects.
Q8. Why use many segment requests instead of one huge video response?
Segment-based streaming allows the player to start quickly, buffer ahead, switch quality, and use CDN caching efficiently.
For example, if the network becomes slower, the client can request the next segment at a lower quality:
segment001 -> 1080p
segment002 -> 1080p
segment003 -> 480p
segment004 -> 480pThis is adaptive bitrate streaming: the client chooses future segments based on network speed, buffer state, and playback needs.
The key point is that the player does not have to commit to one quality level for the entire video. It can choose each upcoming segment based on current playback conditions.
Q9. Why are HLS and MPEG-DASH more CDN-friendly than WebSocket?
HLS and MPEG-DASH are CDN-friendly because they turn video into URL-addressable HTTP objects, such as playlists, manifests, and media segments.
The important structural change is that the video becomes a set of addressable HTTP resources instead of one opaque stream tied to one connection.
For example:
master.m3u8
720p/segment105.ts
720p/segment106.ts
480p/segment105.ts
480p/segment106.tsA CDN is strongest when many clients request the same URL and can receive the same cached response from a nearby edge server.
That fits media segments well:
Client A -> GET /video/720p/segment105.ts
Client B -> GET /video/720p/segment105.ts
Client C -> GET /video/720p/segment105.tsThe first request may fetch the segment from the origin server, but later requests can often be served from the CDN edge cache.
The cache-friendly unit is not the whole video session. It is an ordinary HTTP object, such as one media segment URL.
The key point is not merely that HLS uses HTTP. The key point is that HLS creates cacheable HTTP objects that a CDN can serve to many viewers.
Q10. Is WebSocket the usual choice for YouTube-like video playback?
No. WebSocket is useful for bidirectional message exchange, such as chat, notifications, or real-time state updates.
YouTube-like or Netflix-like video playback usually favors HLS or MPEG-DASH over HTTP because media segments can be buffered, cached, distributed by CDNs, and selected by quality.
A practical distinction is:
chat, notifications, real-time app state
-> WebSocket
large-scale video playback with buffering, CDN caching, and quality switching
-> HLS or MPEG-DASH over HTTPWebSocket can carry data continuously, but continuous delivery alone is not the main requirement for large-scale video playback. Cacheable segment distribution is usually more important.
Q11. Does WebSocket mean a CDN cannot sit between the client and server?
No. A CDN or proxy can sit between a WebSocket client and server if it supports WebSocket.
The more precise distinction is:
Can a CDN sit in the middle?
-> often yes
Can a CDN reuse WebSocket messages like normal HTTP cached objects?
-> usually noAfter the initial connection upgrade, WebSocket data flows as messages inside a persistent bidirectional connection. Those messages usually do not have independent cacheable URLs, HTTP methods, HTTP responses, and cache headers.
So with WebSocket, the CDN often acts as a connection proxy, not as a normal HTTP object cache.
Q12. Could a CDN inspect WebSocket messages and return cached responses?
Technically yes, but that requires custom edge application logic.
The edge application would need to understand the WebSocket message format, create a cache key, decide whether the response is safe to reuse, and send a protocol-specific response.
For example, this can make sense for safe request-response lookups:
public product price lookup
public configuration lookup
public score snapshotIn general, the message has to behave like a safe lookup: the same input produces the same response, the result is not user-specific or authorization-sensitive, and slightly stale data is acceptable.
But it is a poor fit for flows where message order, user state, authorization, or immediate freshness changes the meaning:
chat
user-specific notifications
game input
video call media
order processing
real-time control commandsSo WebSocket message caching is possible in a custom application sense, but it is not ordinary CDN caching.
Q13. What is the practical decision rule?
Use HLS or MPEG-DASH over HTTP when many users need to watch the same video reliably through buffering, quality switching, and CDN-cacheable segments.
Use WebSocket for real-time messages around the video, such as chat, notifications, presence, or live reactions.
Use WebRTC instead of HLS when the core requirement is low-latency two-way audio, video, or data communication, such as video calls or real-time screen sharing.
The central model is:
large-scale video playback
-> cacheable HTTP segments
-> HLS or MPEG-DASH
real-time app messages
-> persistent bidirectional messages
-> WebSocket
low-latency two-way media
-> real-time media transport
-> WebRTC