How Does HTTP Deliver Data for Video Streaming?

Q1. How Does HTTP Deliver Data for Video Streaming?

HTTP delivers video streaming data by letting the client repeatedly request media metadata and small time-based media segments, while the player buffers, decodes, and plays the received data.

The important point is that playback and downloading overlap in time. The client does not usually wait for the entire video file to finish downloading before playback begins.

So HTTP video streaming is usually not one endless server push. It is usually a client-driven process where the player repeatedly fetches time-based media segments over persistent HTTP connections.

Q2. Is the video sent as one continuous HTTP response?

Usually no. Modern internet video streaming commonly divides the media into small time-based segments, such as a few seconds per segment.

For example:

segment001 -> 0-4 seconds
segment002 -> 4-8 seconds
segment003 -> 8-12 seconds

The viewer experiences one continuous video, but the client may internally request and receive many small media segments.

Q3. What does the client actually receive?

The client is not receiving finished screen images. It receives compressed and encoded video/audio data.

A simplified flow is:

compressed video/audio data
-> network transfer
-> client buffer
-> decoder
-> screen and speaker output

The player turns the received media data into video frames and audio output.

Q4. Why does one video create many HTTP requests and responses?

A single video can involve many HTTP requests and responses because the client may request a playlist, a quality-specific manifest, initialization data, and many media segments.

For example, if each segment contains 4 seconds of media, a 10-minute video can require about 150 segment requests, not counting manifest and other supporting requests.

The key correction is that these requests are not usually made per video frame. They are usually made per media segment.

Q5. Does every HTTP request require a new handshake?

No. A handshake happens when a connection is created, not every time an HTTP request is sent.

A more accurate model is:

create TCP connection
-> TCP 3-way handshake
 
create HTTPS session
-> TLS handshake
 
send many HTTP requests and responses
-> reuse the existing connection when possible

So many segment requests do not imply the same number of TCP handshakes.

The key distinction is that HTTP requests are repeated application-layer messages, while TCP, TLS, or QUIC handshakes belong to connection setup.

Q6. What changes with HTTP/1.1, HTTP/2, and HTTP/3?

With HTTP/1.1, persistent connections allow multiple requests and responses to reuse the same TCP connection.

With HTTP/2, multiple HTTP streams can share one TCP connection more efficiently.

With HTTP/3, HTTP runs over QUIC instead of TCP, so the connection setup is QUIC-based rather than a TCP 3-way handshake.

The general principle remains the same: HTTP requests are application-layer messages, while TCP, TLS, or QUIC handshakes belong to connection setup.

Q7. What are HLS and MPEG-DASH doing on top of HTTP?

Viewer-side video streaming commonly uses HTTP-based adaptive streaming, especially HLS or MPEG-DASH.

These are not replacements for HTTP in the usual sense. They define how media playlists, manifests, quality variants, and segments are organized so that the client can fetch them over HTTP.

A useful layering model is:

HLS / MPEG-DASH
-> streaming format and segment selection rules
 
HTTP/1.1, HTTP/2, or HTTP/3
-> request/response delivery of playlists and segments
 
TCP or QUIC
-> transport connection underneath HTTP
 
IP
-> packet routing between hosts

HLS and MPEG-DASH sit above HTTP: they define the media structure, while HTTP delivers the playlist, manifest, and segment objects.

Q8. What does HLS change about video delivery?

HLS changes video delivery from “one continuous server-pushed stream per user” into “many small HTTP objects that the player requests as needed.”

Those HTTP objects can include a master playlist, quality-specific playlists, and media segment files.

For example:

master.m3u8
720p/segment105.ts
720p/segment106.ts
480p/segment105.ts
480p/segment106.ts

The important structural change is that the video becomes URL-addressable HTTP resources instead of one opaque stream tied to one connection.

Q9. Why use many segment requests instead of one huge video response?

Segment-based streaming allows the player to start quickly, buffer ahead, switch quality, and use CDN caching efficiently.

For example, if the network becomes slower, the client can request the next segment at a lower quality:

segment001 -> 1080p
segment002 -> 1080p
segment003 -> 480p
segment004 -> 480p

This is adaptive bitrate streaming: the client chooses future segments based on network speed, buffer state, and playback needs.

The key point is that the player does not have to commit to one quality level for the entire video. It can choose each upcoming segment based on current playback conditions.

Q10. Why does the HLS or MPEG-DASH structure match CDN caching?

A CDN is strongest when many clients request the same URL and can receive the same cached response from a nearby edge server.

HLS and MPEG-DASH segments fit this model because the same segment file can be requested by many viewers and served repeatedly without going back to the origin server.

The cache-friendly unit is not “the whole video session.” It is an ordinary HTTP object, such as:

720p/segment105.ts

If many viewers request that same segment URL, the CDN can reuse the cached object.

Q11. Is WebSocket the usual choice for YouTube-like video playback?

No. WebSocket is useful for bidirectional message exchange, such as chat, notifications, or real-time app state updates.

YouTube-like or Netflix-like video playback usually favors HLS or MPEG-DASH over HTTP because media segments can be cached, distributed by CDNs, and selected by quality.

A practical distinction is:

chat, notifications, real-time app state
-> WebSocket
 
large-scale video playback with buffering, CDN caching, and quality switching
-> HLS or MPEG-DASH over HTTP

WebSocket can carry data continuously, but continuous delivery alone is not the main requirement for large-scale video playback. Cacheable segment distribution is usually more important.

Q12. Why does WebSocket not match the normal CDN cache model well?

WebSocket does not match the normal CDN cache model well because, after the initial HTTP upgrade, data flows as messages inside a persistent connection rather than as independent HTTP requests and responses.

Normal CDN caching depends on recognizable HTTP objects:

URL
method
headers
response body
cache-control rules

HLS and MPEG-DASH expose video data through those HTTP objects.

WebSocket messages, by contrast, are usually interpreted by the application protocol running inside the connection. The CDN cannot automatically treat each message as a cacheable HTTP response unless custom edge logic is added.

Q13. Does that mean a CDN cannot sit between a WebSocket client and server?

No. A CDN or proxy can sit between a WebSocket client and server if it supports WebSocket.

But in that case it usually acts as a connection proxy, not as a normal HTTP object cache.

The distinction is:

CDN for HLS/DASH
-> cache one segment response and reuse it for many clients
 
CDN/proxy for WebSocket
-> keep or forward long-lived connections between clients and servers

So the issue is not whether a CDN can be present. The issue is whether the CDN can reuse the same cached media object for many viewers.

Q14. Could a CDN cache WebSocket messages?

Technically yes, but that requires custom application logic at the edge.

The edge application would need to understand the WebSocket message format, create its own cache key, decide whether the message is safe to cache, and send a protocol-specific response.

That is not ordinary CDN caching. It is closer to building a custom request-response cache inside a WebSocket application.

Caching WebSocket messages is reasonable only when the message behaves like a safe lookup:

same input -> same response
no user-specific state
no authorization-sensitive result
slightly stale response is acceptable

It is a bad fit for chat, user-specific notifications, game input, order processing, video call media, and other flows where order, user state, authorization, or immediate freshness changes the meaning of the response.

Q15. How does WebRTC fit into this distinction?

WebRTC is better than HLS when the main requirement is low-latency bidirectional audio, video, or data communication.

Examples include:

video calls
real-time screen sharing
interactive voice communication
low-latency peer-to-peer media

HLS or MPEG-DASH is usually better when the main requirement is stable large-scale video distribution through CDN infrastructure.

The practical distinction is:

many viewers watching the same media with buffering
-> HLS or MPEG-DASH
 
participants exchanging live media with very low latency
-> WebRTC
 
real-time messages around the media
-> WebSocket

Q16. What is the decision rule?

Use HLS or MPEG-DASH when many users need to watch the same video reliably through cacheable HTTP segments.

Use WebRTC when users need low-latency two-way media interaction.

Use WebSocket for real-time messages around the video rather than for CDN-friendly video delivery itself.

The final distinction is that HLS and MPEG-DASH make video delivery cacheable and scalable by turning media into HTTP segment objects, while WebSocket keeps a live message connection that is useful for interaction but not naturally cacheable as CDN-served video content.

Insight Vault

Browse

How Does HTTP Deliver Data for Video Streaming? - GPT Very High