Internet video

Thursday 22nd of November 2012 02:13:31 PM

  Toggle Advanced Options

Online audio and video formats

Multimedia Container formats

  • Audio
  • Video
    • MP4
    • WMV, WMA, AVI (Microsoft)
    • MOV (Apple)
    • FLV - Flash Video Format (Adobe)


  • Container: a container or wrapper format is a metafile format whose specification describes how different data elements and metadata coexist in a computer file.
    • Header
      • The header will also usually tell you how the container is compressing the (stream) data and by extension, which streams are present in the container.
      • Containers use codecs as a way to compress data. The codec can be thought of as a key that tells you how you open up the stream to uncompress the data.
    • Packets (audio data, video data, and potentially other data like subtitles)
      • Data is compressed or encoded to take up less space
      • Packets are organized into the appropriate stream (based on information obtained from the header). Each packet tells the player which stream it belongs to. For most containers, the packet will also give the player a time stamp (the time in some unit relative to the start of the container that the audio should be heard, the video should be shown, or the subtitle displayed). For FLV the time stamp is 1/1000 of a second.
    • Trailer (not in FLV)
    • Containers contain streams: a collection of packets that should be treated as sequential data.

Key Frame Compression

Terms and definitions:

  • Key frame: a frame in which a complete image is stored in the data stream; that is, a key frame contains all of the necessary data to display the image.
  • The first packet in a stream is (normally) a key frame. That is, a player cannot begin to reproduce a video until it receives the first key frame.
  • The first key frame is sent and subsequently the image "diffs" (inter frames(?)) are sent and overlaid on top of the key frame to produce the next frame.
  • Inter frame: the difference between the current frame and the last frame.

Media players

Players convert containers into video that can be seen and heard. Media players have coders (or de-coders) for each audio, video, or subtitle stream in the container.

The player opens up the container and reads the header to determine the codecs. It then takes those codecs to configure the decoders. Then the player starts reading the packets. If it is an audio packet the player sends it to the audio decoder. The decoder uncompresses the packet into sound that you can hear and plays it to the speakers on the right time based on the packet's time stamp.

A similar process is performed for video. That is, the player reads the (video) packet and sends it to the video decoder. The decoder uncompresses the video (potentially using key frames from the past) and shows it on the screen at the right time. Media players look at the time stamps of the audio and video packets to determine when they should play the audio and/or the video.

Container diagram

Terms and definitions

  • FFmpeg: a free software project that produces libraries and programs for handling multimedia data
  • Transcoding
  • Codecs
  • Frame rate: (also known as frame frequency) is the frequency (rate) at which an imaging device produces unique consecutive images called frames. Alternatively, frame rate for video is the frequency per second at which video pictures appear in a container.
  • Time Base: the units that a stream uses in a container for each "tick" of it's clock.
  • Bit rate: (sometimes written bitrate or as a variable R[1]) is the number of bits that are conveyed or processed per unit of time
  • Sampling rate: The sampling rate, sample rate, or sampling frequency defines the number of samples per unit of time (usually seconds) taken from a continuous signal to make a discrete signal
  • Real Time Messaging Protocol (RTMP): was initially a proprietary protocol developed by Macromedia for streaming audio, video and data over the Internet, between a Flash player and a server
  • Real Time Streaming Protocol (RTSP): The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between end points. Clients of media servers issue VCR-like commands, such as play and pause, to facilitate real-time control of playback of media files from the server.
  • Real-time Transport Protocol (RTP): defines a standardized packet format for delivering audio and video over IP networks
  • H.264: a standard for video compression, and is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video