Ethernet-Based Live Television Production

Danvers Flett

Global Television, 55 Coventry st, Southbank, Vic 3006, Australia

Written for presentation at the SMPTE11 Australia Conference

First published in Content and Technology, Volume 9, Issue 1, Darlinghurst, NSW, Australia: Broadcastpapers Pty Ltd, pp. 30-36, 15 February 2012

Abstract. Ethernet is poised to become the backbone of the live television production industry. Live television broadcasts have for decades been produced using the circuit-switched method, but recent networking advances have enhanced Ethernet to the point where packet-switching can reliably process and deliver multiple uncompressed, high-definition television signals. By moving to near-commodity Ethernet-based technologies the test television industry stands to make significant cost savings over currently-used switching and distribution equipment. This paper examines the current and near-future feasibility and impact of using Ethernet as the sole medium for distributing all signals in a television studio or outside broadcast.

Keywords. Ethernet, television, uncompressed, High Definition, 1080i, 1080p, multicamera, live television production, packet-switched, circuit-switched, 100GbE, 100Gbps, Gigabit Ethernet, Audio Video Bridging, AVB, multiplexing, genlock, IEEE 802.1AS, Timing and Synchronization for Time-Sensitive Applications, Precision Time Protocol, PTP, IEEE 802.1Qat, Stream Reservation Protocol, SRP, Internet Protocol version 6, IPv6,  latency, low-latency, HDSDI, SDI, HBRMT, vision mixer, camera, CCU, router, distribution amplifier, EVS, replay, multiviewer, topology, price curve, cost comparison


In the near future, television broadcast engineers will spend less time plugging in cables and instead, will be able to build powerful broadcast facilities far more quickly and cheaply than today. Ethernet, the technology that has come to dominate the Information and Communications Technology industries, is poised to become the backbone of the live television production industry.  Live television broadcasts have for decades been produced using the circuit-switched method, but recent networking advances have enhanced Ethernet to the point where packet-switching can reliably process and deliver multiple uncompressed, high-definition television signals.  By moving to near-commodity Ethernet-based technologies the television industry stands to make significant cost savings over currently-used switching and distribution equipment.

The Status Quo – Circuit Switching

Whilst video-over-IP is not uncommon today in contribution and distribution environments, live video production environments still generally use baseband, one-signal-per-cable circuit-switching.  There are a number of reasons for this, foremost being that until recently it was the only practical way to distribute and switch live video.

Circuit-switched video as a system offers high-reliability in video switching and transmission.  Instant availability of every video source, and the ability to switch to any other video source instantly is of critical importance in multicamera video production. 

A circuit-switched system guarantees the availability of every source.  Every cable is guaranteed to be able to deliver one full-bandwidth video signal without dropouts or delay.  In a single-signal-per-cable system, physical damage to a cable will only disconnect one signal.

Live Means Live

In live television there is very little tolerance of latency in video signals.  As many of the cameras are pointed at the same object or person, any slight differences in delay quickly become jarringly apparent when the director switches between them.  The cameras must all be synchronised, that is, have the same or as close to the same video delay when they reach the video switcher.  Delays of one frame (20 ms) or more are noticeable and cause irritation.

In a live environment it is expected that equipment used in the transmission, processing and distribution of video signals introduce almost zero delay wherever possible.  To a certain degree, any live video signal that has been delayed by 20ms or more is considered to be “tainted” and out of sync with the rest of the system.   A delayed signal becomes problematic.  Delaying all the other signals to match it is generally not practical.  Delaying a live video signal requires processing equipment.  There may be many live video sources in a multicamera production; inserting processing into every one of them would be expensive.

To summarise: in a multicamera production environment, live signals must have a guaranteed and consistent latency of as close to zero as possible.

Traditional Ethernet’s Unsuitability to the Task

Traditionally, the television industry has rightly viewed Ethernet as a medium unable to deliver live television signals, with their very high demands on bandwidth, instant and constant availability and low-latency. 

Ethernet is a packet-switching network.  Information transmitted is broken up into “packets” and sent over the network.  Packets can arrive at the destination with varying delays and out-of-order.  Standard Ethernet does not guarantee reliability.  Ethernet will drop packets to maintain traffic flow if switches become congested.  This is an ideal medium for Information Technology networking – sharing of files, web pages, emails, etc, where variable delays in transmission are barely noticeable – and full and accurate delivery of information is preferred at the cost of extra delay.

Ethernet’s “natural” or designed tendency in its basic form is to slow down all traffic flows to enable fair bandwidth sharing if the medium becomes saturated with traffic.  It also has guaranteed delivery mechanisms whereby lost packets are retransmitted.  While there are standards that can be applied to provide Quality of Service (QoS), these are not enabled by default on generic Ethernet networks. Tight control over the network is required to achieve guaranteed latency and error correction behaviour.

Digital Video vs Ethernet Link Bandwidth over time

Fig. 1.  Digital Video vs Ethernet Link Bandwidth over time


Comparison of Bandwidth Changes Over Time

Until very recently, the bandwidth of a single uncompressed digital video link greatly exceeded the practical capacity of even high-end Ethernet standards.

The most-used Standard Definition digital video standard started at 270 Mbit/s in 1993.  Interlaced High Definition video appeared in 1998 using 1485 Mbit/s, and Progressive (single-link) HD video appeared in 2006 using 2970 Mbit/s (or 2.97Gigabit/s)[1][2].

By comparison, the first Ethernet standard in common use appeared in 1990 with a capacity of 10 Mbit/s. Successive standards have appeared approximately every 5 years since then, each increasing the capacity by a factor of ten[3].  The most recent standard, released in 2010, has a capacity of 100 Gigabit/s.

Television Network Contribution and IPTV

Television Networks are starting to realise the usefulness of IP and Ethernet networks for use in “Contribution” (remote broadcast or news footage to studio) feeds.  Contribution feeds are usually only a single feed from a remote site, and therefore only one point-of-view-at-a-time of any particular person or object at the remote site, even if the remote site is a live multicamera broadcast.  Contribution feeds do require constant availability of the video stream, with a fixed delay, but the delay itself can be relatively long.  Some satellite contribution feeds have delays of over a second.

In this environment, television networks[4][5] have begun to use compressed video-over-IP.  This allows them to use cheaper, more generic packet-switched data networks to get the remote site feed back to the studio as opposed to using circuit-switched feeds such as dedicated microwave links, fibres or satellites.  In the US, where there is an extensive fibre-optic network and a lot of packet-switched bandwidth available, some contribution feeds are now being done with uncompressed video over IP[6].Likewise, at the 2010 Delhi Commonwealth Games, Cisco facilitated a contribution-over-IP network for the transfer of live, HD, uncompressed television signals from games venues to the IBC[7].

Internet Protocol Television (IPTV) started in the 1990s using videoconferencing software to distribute pre-produced television programming over the Internet[8]. As compression technologies evolved and available bandwidths increased, IPTV began to be seen as a serious alternative to traditional satellite and cable delivery of TV programming.  Delivery of programming to the customer requires constant availability of at least one video feed, with constant latency, but the duration of the latency is unimportant.  IPTV offers some efficiency over traditional cable and satellite delivery in that the “last mile” of cabling to the customer’s house need only have enough bandwidth for one video feed per “receiver”.  There is no need to deliver all available channels simultaneously to the customer.  The IPTV “receiver” is effectively only requesting one channel at a time to view.

The security industry is also moving towards IP-based distribution and recording of CCTV images[9].  It would be worthwhile to examine the pros and cons of this approach to see if any of these technologies could be applied to the broadcast TV industries.

The Case for Ethernet as the Common Distribution Medium

Given the requirements outlined above, circuit-switching is obviously ideally suited to live multicamera production.  A packet-switched approach to live production would need to meet the same stringent requirements of reliability and latency.  Until recently, this was not possible.

As mentioned previously, Ethernet now has the capacity to carry multiple uncompressed video streams.  10Gbit Ethernet can theoretically carry 37 uncompressed Standard Definition (625i) video streams. 100Gbit Ethernet could carry 33 1080p or 67 1080i uncompressed High Definition streams.

Why Use Uncompressed Video?

Using compressed video seems an attractive option; HD video is often compressed from 1.5GBit/s to 100MBit/s (or less) in broadcast environments.  But this is generally for recording or for contribution purposes only. On the surface, compressing video streams for live production might look like a money-saver but compression has its problems.

Latency is the main issue; even the most efficient video compression and decompression hardware introduces 1 frame of delay. This is expensive, proprietary hardware and generally only used for wireless (RF) cameras.

In a single live production facility there are hundreds, possibly thousands of monitors, recording devices, mixer inputs and other processing hardware that need to receive video signals.  If those video signals were compressed, every single device needing to receive video would need to decompress each signal before processing or displaying the video.  Every device would then need to recompress each and every video output for further distribution.  Fast compression/decompression (codec) hardware is expensive, setting codecs to be compatible with each other can be complex, codec algorithms are mostly patent-encumbered and compression standards change quickly over time.  And again, every time live video is compressed and decompressed, delay is unavoidably introduced (and quality can be lost).

Uncompressed video is the universal standard in live production facilities presently using circuit-switched networks, and it should remain so for packet-switched networks.

Why Use Packet Switching?

Circuit-switched signals require a single cable-(copper or fibre)-per-signal.  As the scale of television outside broadcasts increases, the amount of cabling also increases.  Each video input and video output requires at least one cable to form a connection.  Most video sources need to be distributed to multiple destinations: to video monitors, recorders and switchers.  Many video sources need to be processed before going live to air.  Each processing device has at least one input and one output connector.  Coaxial cable and its connectors are relatively expensive when compared to twisted-pair.

At an outside broadcast, the cabling outside the broadcast facility is mostly temporary.  Large outside broadcasts have many cameras, recording devices and large production crews.  Even large outside broadcast trucks are generally unable to house the entire crew and all equipment internally, so the crews and their equipment are spread out across multiple trucks and/or site sheds.  This leads to lots of interconnect cables between trucks and sheds, each cable with its own specific purpose.  The cables generally converge on the main switching facility.  The area where cables are connected, the external patch panel, can have over 500 connectors in close proximity.  The patch panel gets very congested on a large project, but it is important to keep track of what every connector and every cable does.

The permanent internal cabling of large broadcast facilities is equally as complex, albeit neater.  Every input and output of every device is cabled to a patchfield or video router.  Some devices have multipin connectors for carrying multiple signals on one cable, but these are the exception in the video world.  Most modern outside broadcast trucks have many kilometres of individual strands of coaxial cabling inside, and thousands of coax connectors.  Just about every digital video input and output of every device requires a Serial Digital Interface (SDI) receiver or transmitter chip to be able to interpret, display, process or even duplicate to on-pass the video stream.

Clearly, this situation would benefit from some sort of multiplexing technology.

In the Information Technology world, Ethernet is becoming the clear winner in the network standards war.  In the Television industry, Ethernet is becoming more prevalent as a medium for data transfer and remote control, replacing proprietary protocols and open, circuit-switched protocols such as RS-422.

There has been a reluctance to rely heavily on Ethernet for live video and audio signals for reasons mentioned previously.  But Ethernet technologies have emerged in recent years that make transmission of broadcast video a realistic proposition.

Firstly, Ethernet bandwidth is increasing exponentially, by a factor of 10 every five years or so.  And the cost-per-port of these high-bandwidth Ethernet connections is falling quickly[10].  While video bandwidth requirements are increasing, Ethernet capacity is increasing at a much faster rate.

The Big Hurdle: Latency

Ethernet switches are becoming much faster at switching due to their increasing bandwidth capacity.  In 2000, the best Gigabit Ethernet switches had latencies of between 15-30 milliseconds for large packets[11]; up to and over the duration of a single frame of video. In early 2010, a review of 10Gbit Ethernet switches[12] showed that the slowest latency for switching large, multicast packets (typical for video) was 45 microseconds, and most switches are much faster than this.  One line of HD (1080p/60fps) video is 15 microseconds in duration, so the slowest enterprise switches introduce about 3 lines of delay, and most much less than 1 line.

In comparison, a typical device used in circuit-switched video for splitting signals for distribution and monitoring, a reclocking distribution amplifier, introduces a delay of about 3.8 nanoseconds[13] in HD video.  A video routing switcher would introduce a similarly small amount of delay.  Distribution amplifiers and routing switchers generally do not store or process video signals.  They duplicate signals at a basic electrical component level, so any delay introduced is purely electrical.

For reference, a typical video production switcher, the Grass Valley Kalypso HD, has an autotiming window of +/- 6 microseconds in 1080i/60 mode[14].  This means that source video signals received by the switcher must be synchronised to within 12 microseconds of each other.  That is, the start of each frame of video from each video source must be received within 12 microseconds of a frame from any other video source in the switcher.

Cameras and most other devices in a live television facility are “genlockable”.  That is, they are synchronised to a master timing pulse.  In a circuit-switched facility, this synchronisation pulse is supplied to the device to be genlocked via a coax cable from the Sync Pulse Generator (SPG).  All genlocked devices are in sync with the SPG and with each other.  Cable lengths and other processing delays cause the devices’ timing to be offset from each other, but this is corrected by manually adjusting the genlock timing of each device, so that their video signals are in exact sync when they reach the VMU.

It is important to note this difference.  In an IT system it is desirable that all devices be in absolute sync to a master clock.  Video sources are timed relative to one or more switching devices such as Vision Mixers.

In a circuit-switched world, 1 microsecond is still a long time.  In a hypothetical packet-switched TV outside broadcast, a live camera feed might traverse 3 Ethernet switches before reaching the Vision Mixing Unit (VMU) where it must be in sync with all the other live video feeds to be mixed.  If each switch has a maximum latency of 1.5 microseconds, the total delay for that camera would be 4.5 microseconds; about a third of an HD line.    For a television engineer this is quite a big delay, but is manageable if most other video sources have a similar delay and if they can accurately synchronise their genlock offset to compensate.

IEEE Audio/Video Bridging Task Group

“Traditional” Ethernet switches, routers and network interface cards (NICs) do not provide any guarantee as to absolute or relative latency.  All traffic is delivered on a “best effort” basis. Traffic flows are dynamic; they fluctuate over time and this affects latency and packet-loss rates.

The Institute of Electrical and Electronics Engineers (IEEE), publishers of the Ethernet standard (IEEE 802.3) has created the Audio Video Bridging (AVB) Task Group to work on standards to "provide the specifications that will allow time-synchronized low latency streaming services through IEEE 802 networks".[15][16][17]  Most of these standards have recently been ratified and Gigabit Ethernet switches that support AVB are now starting to appear.[18][19][20]

It is important to understand that traditional (i.e. non-AVB) Ethernet hardware does not support the enhancements of AVB.  Therefore, to utilise the AVB enhancements, all signal paths carrying audio and video data must be routed over AVB-enabled networking equipment.  However, from a network engineering point-of-view AVB is largely plug-and-play.  The networking intelligence is in the protocols and the hardware. 

Importantly, AVB provides “bounded latency” – a maximum latency guarantee.  For 1 Gigabit Ethernet (GbE) it is about 25µs per link[21].  For 10 GbE it should be 10 times faster – about 2.5µs.

Genlock Over the Network

One of the AVB standards is IEEE 802.1AS: Timing and Synchronization for Time-Sensitive Applications[22].  This provides the protocols for Ethernet infrastructure to maintain accurate and fixed genlock for television equipment.

In a circuit-switched world, sync signals (genlock) are distributed on separate cables separate to the vision (and audio) cables.  Having properly genlocked signals is most important in the case of a Vision Mixer.  Most professional vision mixers will not properly pass a non-sync signal.  Whilst most recent Vision Mixers have an “autotiming window” that allows signals that have some genlock offset to pass, this window is limited in its ability to correct genlock offsets.  The facility engineer must monitor genlock as part of the signal checking process, and correct any genlock timing that is “out of range”.

Different types of television devices are usually on separate control networks.  Only the facility engineer has the overview and the ability to work out where and how to adjust various devices’ genlock timing to suit the VMU.

In a packet-switched world, all devices are on the same network and are able to communicate with each other.  Assuming that the VMU itself is purely an Ethernet device, each camera (or any genlockable device) should be able to automatically measure the network latency from itself to the VMU (using a network “Ping” or similar protocol) and adjust its own genlock offset accordingly.  Any camera feed obtained from the same Ethernet switch that the VMU is directly connected to will be correctly synchronised.  The final Ethernet switch in the path (before the VMU) becomes the “timing plane”. 

This automatic genlock adjustment is achieved in a similar manner to the Network Time Protocol (NTP).  As television systems require nanosecond-level timing accuracy, the AVB Task Group chose a more accurate protocol - Precision Time Protocol (PTP)[23][24] on which to base IEEE 802.1AS[25].

The concept of providing genlock over a packet-switched network is not new.  Circa 2000 a company called Path 1 released a technology called TrueCircuit[26].  It was able to multiplex uncompressed video over IP and Gigabit Ethernet.  A central feature of the design was the ability to propagate genlock over IP (apparently without latency).  Ten years later and unfortunately, TrueCircuit appears nowhere to be seen in broadcast facilities.  A product ahead of it’s time, perhaps.

Reserved Bandwidth

To enable a packet-switched network to adequately replicate the reliability of a circuit-switched network, live video and audio streams need delivery and bandwidth guarantees.  That is, they require a “virtual circuit”[27].  The AVB Task Group has devised a standard called IEEE 802.1Qat: Stream Reservation Protocol (SRP)[28][29] to facilitate this.

SRP enables AVB-capable-Ethernet infrastructure to reserve bandwidth for the transmission of a specific stream, which could be an audio, video or equipment control/monitoring data stream.  SRP provides for Ethernet switches (or “bridges”) to communicate stream reservations to each other along the path(s) of the streams.  These reserved paths are then immune to fluctuations in traffic flows of other unreserved data.  The users of the network can freely use the network for mixed purposes – reserved AV data and other data (such as email / or web traffic) and be assured that the television streams will be undisturbed.  SRP also ensures that it is impossible to saturate the network path from source to destination with reserved streams.  The intermediate switches check their bandwidth allocations before allowing a stream to be established.

Other Networking Technologies for Packet-Switched Television

Encapsulation of Digital Video

Existing Society of Motion Picture and Television Engineers (SMPTE) standards covering the transmission of digital video (Serial Digital Interface or SDI) specify dedicated coaxial or fibre cabling for each signal.  Therefore these data streams do not tolerate packet loss, and are not designed with Ethernet or Internet Protocol in mind.  To be transported across a packet-switched network, digital video data streams designed for circuit-switched networks must be encapsulated onto streams compatible with packet-switching.

To enable live video streams to be transported across an unreliable network – that is, a network that does not guarantee delivery of packets in order, or at all, some sort of Forward Error Correction (FEC) must be added to the data.  FEC allows a live data stream to tolerate packet loss without requiring retransmission of data.  FEC does this by embedding redundant data into the stream.

The Video Services Forum (VSF) has undertaken to design a standard for encapsulation and FEC of television signals.  It calls this proposed standard High Bit Rate Media Transport (HBRMT)[30].

HBRMT has been designed for contribution-style video links across networks not necessarily under the control of the broadcaster and not supporting standards such as Audio Video Bridging.  In a live production facility that uses AVB-capable Ethernet, FEC becomes less important, as signal streams should have reserved bandwidth and not suffer packet loss.  The HBRMT protocol itself would still be useful as a standard to encapsulate video signals.  Hopefully it will be possible to vary the FEC rate of different streams on HBRMT.   Streams always kept inside the facility should be able to be transported with zero FEC, where error-correction shouldn’t be needed.

Internet Protocol Version 6

Internet Protocol Version 6 (IPv6) is a replacement for the IPv4 which is the Internet addressing system in current use.  Global IPv4 address space is rapidly diminishing as more and more devices connect to the Internet, however changeover to IPv6 is currently proceeding slowly.

Assuming that uncompressed live video streams are transported on top of the IP protocol, there is very little reason not to implement AVB-enabled television systems as all-IPv6 from the outset.

Live video and audio transmission in television production facilities at present is almost entirely done via circuit-switching.  What packet-switched networks there are, are mostly used for device monitoring and control, and file-sharing; these are usually IPv4 on Ethernet.  Switching live video streams distributed over Ethernet would be an entirely new technology with few or no significant legacy IPv4 systems to support.

When a television broadcaster or facilities company upgrades its equipment, the studio or truck is usually built or rebuilt from scratch.  The television industry is used to having to perform complete upgrades of facilities.  It has done so many times recently when upgrading from analog to digital, and then again when changing from Standard Definition to High Definition.  Changing over to all-Ethernet-based video distribution would just be another “standard” changeover.

Television production facilities are mostly self-contained.  There is some sharing of data (vision) between different facilities, but for instance, most outside broadcasts have only one “Main Program” output which is fed to the host television network for broadcast.  In networking terms, each facility could be thought of as a largely self-contained LAN, with limited, controlled connections to the outside world.  So there are no significant legacy IPv4 interconnections to support.

The benefits of using IPv6 include a much larger address space than IPv4.  In IPv6, even small private LANs have their own 64-bit address spaces, larger than the entire 32-bit address space of IPv4.[31]  An IPv6 private LAN has several orders-of-magnitude more addresses available than the entire IPv4 Internet has today.  Why would a private LAN need so much space?  With that much address space a network administrator never needs to centrally allocate addresses in order to conserve address space.  The 64-bit address space of a subnet in IPv6 is also much larger than the global 48-bit MAC address space[32] allocated to Ethernet for Network Interface Cards (NICs).  There are more addresses in a private IPv6 subnet than there are Ethernet interfaces in the entire world.  In IPv6, network interfaces can configure themselves with their own MAC address as part of their IPv6 address and not conflict with any other IPv6 address[33].  Under IPv6 engineers in the television industry should get used to dynamic IP addressing and not ever having to worry about static addressing.

Because of this, separate IPv6 LANs can be merged and split with relative ease compared to IPv4.  In the broadcast industry, this would mean two or more OB trucks could merge their networks for a job (or just share certain parts with each other), auto-discover and share their facilities, then go their separate ways for the next job.

IPv6 supports IP multicasting by default, unlike IPv4.  Multicasting is a one-to-many method of transmitting data, and is essential for efficient bandwidth use when dealing with live, uncompressed video streams.  Just as video routers or video distribution amplifiers split video signals for distribution, multicasting allows Ethernet switches to split video streams.

Ultra-high Bandwidth Transceivers

While Ethernet switches currently support up to 100Gbps bandwidth, a 100Gpbs Ethernet Network Interface Card (NIC) to allow non-switch devices to deserialise and actually process and manipulate the data do not yet exist.  However, reprogrammable silicon chips (FPGAs) have been developed that can handle 100-400Gbps applications.[34]  These should be able to take a single 100Gbps or multiple bonded 100Gbps Ethernet links and deserialise the data stream into many individual uncompressed video (and audio and control data) signals internally inside devices such as Vision Mixers, Video Recorders, Multiviewers, etc.

Impact on Broadcast Systems

The workflows and work practices in current use in live television production would need to be maintained.  Production crews should either notice no difference, or an improvement in workflow when working with an Ethernet-based facility.  This means that Ethernet-based broadcast equipment and workflows in use should look, feel and operate the same as circuit-switched equipment.


Broadcast cameras have high-spec image sensors, a range of usability features to allow for prolonged outdoor use, and sophisticated monitoring and control systems.  Ethernet-based cameras must provide all these features.  Circuit-switched cameras are connected to a Camera Control Unit (CCU) by a single triax cable of up to approx 2km or a SMPTE 311M hybrid single-mode-fibre/copper cable.  The CCU does the job of multiplexing the camera’s vision output with return vision (allowing the camera operator to see the Program feed), genlock, audio, communications and camera monitoring and control data.  There are operators inside the truck, called CCU operators, whose job it is to monitor the camera pictures and adjust, in real time, the picture exposure and color quality.  This is done via the remote control of the camera’s iris, shutter, red, green, blue levels and a myriad of other controls.

Cameras can continue to use SMPTE 311M hybrid fibre/copper cables, but instead of carrying proprietary signals, the cables will simply become 10Gbit/s Ethernet fibre cables.  The camera connects to a switch using the 10GBASE-LR fibre interface, with power supplied on the copper pairs.  The camera head should also have at least two 10GBASE-T copper connectors for local monitoring and control.

In an Ethernet-based system, the CCU is redundant.  Ethernet itself becomes the multiplexing medium.  Camera video outputs are obtained from the camera head directly, via the network.  The camera Remote Control Panels (RCPs), used for iris and other controls, are likewise networked to the camera heads directly.  The camera heads obtain return vision, communications, tally, program sound and other signals over the network based on information supplied from central management software.  It would make sense to devise a method for camera heads to automatically obtain these feeds soon after plugging in and booting up, DHCPv6 being the obvious candidate.

Television Signal Routers and Distribution Amplifiers

These devices are redundant in an Ethernet-based facility.  The Ethernet switches and network(s) themselves act as a distributed Vision/Audio/Data router.  The ability to distribute signals around a facility is not limited by input / output connectors, but rather by the bandwidth of and number of ports on Ethernet switches.  As previously mentioned, Multicasting allows the Ethernet switch to split live video streams as needed, in the most efficient manner.

Vision Mixers

VMUs should remain largely unchanged in appearance and operation an Ethernet-based facility.  The main difference will be the number of connectors on the rear of the main processing frame.  A VMU could process 33 1080p inputs and outputs over a single 100GbE link.  More 100GbE links could be added and bonded to increase bandwidth.  The engineering challenge is how to design the internal architecture to receive and transmit all video and control data over a common 100GbE link or links.

A circuit-switched VMU can have 80 or more video inputs, but with 4 Primary/Secondary Mix/Effects (ME) rows, each with 4 keyers (fill and key), a VMU can have an approximate maximum of 48 sources in simultaneous on-air use (not counting physical auxiliary outputs, which are arguably redundant in an Ethernet-based system).  An Ethernet Vision Mixer could take this into account in its internal architecture for reasons of bandwidth efficiency.

Video Replay Devices

The unofficial industry standard in live video replay devices is the EVS XT series[35]. This device has (at present) 6 codec channels that can be configured to be either input or output channels, and an internal hard drive RAID array for storage. XT series servers compress the recorded video for internal storage and network transferring amongst other EVS devices, and to third-party devices.  For live playback, video is uncompressed. In an all-Ethernet world, these devices would remain essentially the same but with similar changes to Vision Mixers.  The forest of connectors on the rear panel is replaced with a single 100GbE connector.  All data into and out of the device is multiplexed over this connector.  This includes compressed and uncompressed video as live streams and clipped footage as files.

Control surfaces such as LSM remotes and IP Directors should use Ethernet as their sole connection to the XT or equivalent.

Having access to a 100GbE network could prompt some design philosophy changes.  The limits imposed by the rear connector real-estate no longer apply and economies could be introduced by increasing the codec capacity of a single replay server – i.e. more than 6 or 8 channels.  Also, offloading the storage onto separate, network-based storage arrays may make sense.

Multiviewers and Signal Processing Gear

Multiviewers take multiple video signals and arrange them into a single-screen layout for monitoring.  Signal Processors are devices such as Aspect Ratio Converters, Color Correctors, Up/Down Converters and the like.  Multiviewers and Signal Processors share some common features in that they take live video signals, alter them in some way, and output them (with minimal latency).  In an Ethernet-based system, it would make sense to take advantage of the multiplexing ability of 100GbE and so design equipment frames to make maximum use of the bandwidth.  This minimises external physical connectors.  A processing gear equipment frame should hold up to 30 processing cards, with all signal I/O multiplexed via a single 100GbE connector.

A multiviewer frame would have two 100GbE connectors bonded together as a single interface to allow for a total of 66 1080p video signals as inputs.  Outputs are not limited by physical connectors, but instead by number of image processors.


Vision monitors can be pure Ethernet devices, with 10GbE ports to enable them to display at least one 1080p uncompressed signal.  Monitors designed for a production control room monitor wall should receive and display tally status via Ethernet.  They could also feature inbuilt 2-port 10GbE switches to allow a monitor-to-monitor daisy-chain topology to simplify cabling.  Cabling could also be simplified with Power-Over-Ethernet (PoE).

Monitors designed for operator positions (within arm’s reach) could feature multitouch screens and embedded, mobile-style operating systems.  These onboard OSes would be used as “router panels”, selecting signal sources to the monitor and to recording devices, largely being a replacement for traditional router panels.  The monitor OS could also host operator-specific “apps” to assist with the job role.  A vision monitor with access to a data network can receive telemetry from devices on the network and overlay this data on the relevant vision, for instance a “CCU” operator could overlay camera parameters over the vision for that camera, a replacement for the “Pix Output” of a CCU.

Cabling Topology

Ethernet offers the opportunity to multiplex almost every signal in a broadcast facility onto a common medium.  Cameras, vision and audio monitors, microphones, communications feeds, control and telemetry data can all use the same cabling and same network switches.

Besides the obvious efficiencies gained by doing this, some not-so-obvious features will become apparent.  Having absolutely every device connected to an Ethernet network allows monitoring the health of each device, and the system as a whole, without having to run any extra cabling.  In many circuit-switched facilities, cables are installed to connect to the important connections: video and audio I/O, control panels, etc.  Connections such as General Purpose Input (GPI) or Peripheral Bus (PBus) are often overlooked for many devices, being deemed unimportant.  Over Ethernet, any device could implement a “virtual GPI” to trigger or be triggered by, any GPI from any other device.

All broadcast facilities would make use of a “core” switch.  This is a switch with predominantly 100Gbps ports.  These ports would be used to connect to “prime” production equipment such as vision mixers, video replay servers, multiviewers, and video processing frames.  Any equipment that has more than three 1080p video inputs or outputs would benefit from a 100Gbps connection to the core switch.

Ethernet allows for automatic use of redundant paths, and the routing around the failure of a redundant path through the Spanning Tree Protocol (STP) and related protocols[36][37][38].  This allows switches to be distributed around the facility, each connected by two links to the rest of the network.  A loss of one link will be automatically routed around in about three seconds.

In an outside broadcast, this gives engineering confidence in locating switches out in the field.  It would be ideal to place one or more switches in locations central to the most of the cameras.  Each switch in the field will have two redundant 100Gbps paths back to the main OB control unit, and a number of 10Gbps ports for connections to cameras and other equipment.  This greatly reduces the amount of signal cable that needs to be run.


Fig. 2. Ethernet-based broadcast facility


The 100Gbps connections inside the main facility would only need to be short range ports. The 100GBASE-SR10 standard allows runs of 125m over multimode fibre.  The 100Gbps connections between the core switch and the switches in the field would need to be longer range.  The 100GBASE-LR4 standard allows links of 10km over singlemode fibre.  The cameras, field switches and main facility should also feature numerous 10GBASE-LR ports, which allow 10Gbps runs of 10km.

When long-range 100Gbps connections become common in the telecommunications industry, it will become very feasible for entire outside broadcasts to be controlled from a central location. All of the camera and audio feeds could be sent over a 100Gbps link back to a central studio control room




Fig.3. Ethernet-based outside broadcast

Configuration/Control Server

As previously mentioned, a desirable aspect of a complex system that changes constantly with the demands of different clients is a powerful control system that is nonetheless easy to operate.  Information about the present state of the system should be easy to obtain and presented in ways that make simple sense.  Ideally, there should be a unified and open approach to control and monitoring protocols.  With all systems on the same “wire”, the old physical barriers to interconnections no longer exist, and open protocols should win out over proprietary ones.  It should be possible to autodetect, configure and control all systems from a central configuration interface, similar to the Virtual Studio Manager (VSM)[39] system by LSB Broadcast Technologies.

Use of Link Layer Discovery Protocol (LLDP) and Simple Network Monitoring Protocol (SNMP) will help administer the Ethernet/IP network, but these protocols may not be sufficient to provide control of all broadcast systems.  Some parameters of broadcast equipment need to be continually monitored and adjusted at the frame rate of video i.e. at least 30 times a second.  Control of picture exposure, colorimetry, and similar parameters in cameras and color-correcting processors are monitored and controlled in real-time, with less than 20 millisecond latency.  Use of SNMP for this sort of control is not recommended.  It is preferred that as many broadcast systems as possible have real-time, low-latency control and feedback of operational parameters.

The configuration software and server system would combine network management and broadcast systems management capabilities.  An example of its operation would be the case of a camera being plugged into the network.  For security reasons, the cameras’ Ethernet MAC address would be checked against a “whitelist” of known allowable MAC addresses.  If the MAC address is not on the list, an alert would pop up on the Engineer-In-Charge’s (EIC) terminal screen asking whether or not to allow that device onto the network.  If the EIC allows the camera (or other device) on the network, the camera advertises its abilities and requirements to the server.  The camera auto-configures its own IPv6 address and makes a DHCPv6 request, not only for DNS and default gateway addresses, but for broadcast signals on the network.  The DHCP server tells the camera where to find the return vision, program sound, talkback and control signals, among others.  The DHCP server (and/or other discovery protocols) passes on information about the identity of the camera and its capabilities to other devices on the network.

The management software must also tightly control bandwidth allocation.  All video signals on the network are monitored or used on-air almost 100% of the time.  There is no room for any sort of oversubscription of the bandwidth.  Live video streams (and control data) cannot tolerate being slowed down or having packets dropped if the available bandwidth is saturated.  In a live environment, bandwidth saturation must never occur.  As mentioned previously, the IEEE 802.1Qat Stream Reservation Protocol automatically prevents bandwidth saturation.  However some manual intervention would be needed to develop a priority list. Some signals will have precedence over others on the network.  A “first come, first served” policy on bandwidth allocation many not be sufficient.

Cost Comparison

It is interesting to analyse and compare the costs of tradition television signal distribution vs. entirely Ethernet-based distribution.  To do this, one should identify all the costs in traditional television facilities that would be replaced by Ethernet.  As mentioned above, some of these are:

  • Signal routers, including video, audio and data.
  • Communications matrices and/or nodes (but not talkback panels)
  • Distribution amplifiers (DAs)
  • Camera Control Units (CCUs)
  • Signal multiplexing equipment such as SDI-Fibre converters
  • Audio-over-fibre equipment (but not audio analog-to-digital converters)
  • Most of the cabling associated with these equipment items
  • Large numbers of physical connectors, and the silicon associated with driving or receiving SDI and AES over coaxial and balanced cable for each connection.
  • The labor costs of running and terminating the associated cabling in permanent facilities, and at least 50% of cable runs on temporary outside broadcasts.

The above-listed equipment is replaced by AVB-capable Ethernet switches of varying capacities, the associated fibre modules, and Network Interface Cards (NICs) on devices that generate, process or display television signals.

There are currently no “big ticket” items of broadcast video equipment that generate, process or display uncompressed video via AVB-capable Ethernet.  But to analyse the cost comparison, one can imagine that hypothetical AVB television equipment does exist, and that it costs roughly the same as traditional, circuit-switched equipment.  In the longer term, AVB enhancements should not add significantly to the cost of Ethernet equipment, as the changes are at the silicon level only.  The comparison below compares only the costs of signal distribution within a broadcast facility.

With a bit of Internet searching, approximate dollar values can be obtained for most equipment items of interest.  To compensate for error and general bias, prices quoted here are adjusted in favour of traditional equipment, and against Ethernet equipment.

A large-ish broadcast facility would have the equipment specified in Table 1.

Table 1  Traditional broadcast distribution costs

Equipment type

Approx. 2010 price



Total price

Video router

$750 per port[40]

576x576 ports



$     500[41]


$  25,000





Video-over-fibre equipment








Communications nodes / matrixes


4 nodes or 1 main matrix


Cabling, installation costs






Grand Total:


Table 2  Ethernet port costs 2011

Equipment type







Approx. 2011




Total price





$ 60,000[44]


$           0





$ 35,000[45][46]







$   3,000[47]







$      750[48][49]







$      200[50]


$  28,800





Grand Total:


Table 3  Predicted Ethernet port costs 2015

Equipment type








Approx. 2015




Total price












$  5,000







$  1,500


$  25,000





$     375


$  55,500





$       50


$    7,800





Grand Total:


The equivalent AVB-Ethernet-based facility would use switch ports listed in Table 2.

Note that the current price of 100GBASE-LR4 ports is very expensive.  It is expected that over the next 5 years the prices of 100GbE ports will drop dramatically.  This should follow the price drop of 10GBASE-LR ports over the last 5 years.  Ethernet ports have dropped 35% in price every year in previous years.[51]  This will be especially important for television outside broadcasting.  It will be feasible to place switches outside the broadcast truck (or facility), closer to the cameras and microphones

Table 3 details the possible cost of signal distribution in an Ethernet-based facility in the year 2015.


The idea of using Ethernet as the backbone of a live television production facility is bound to meet resistance.  After all, circuit-switched production has been a successful method for decades.  Signal flow is essentially simple to follow, behaves predictably and has multiple redundancy built in – in most places.  However, it is a very manual process to build and configure a circuit-switched facility from scratch; hundreds, if not thousands of physical cables need to be run and connected.  The subsequent software configuration process largely involves documenting and naming the physical connections that have been made for that particular facility, with overlapping information having to be programmed in to separate systems.

Ethernet as the backbone in a television facility opens up significant efficiencies.  The multiplexing ability of Ethernet greatly reduces the number of physical connections that need to be made.  Ethernet also enables automatic detection of the state of the system and the sharing of information and control data between any of the systems on the network.  Only information that cannot be detected on the network, such as signal naming, need be programmed in – and then, only into one system.  This level of connectedness is very difficult to achieve in a circuit-switched system.

Circuit-switched television facilities generally rely on manual fail-over systems to put redundant systems on-line in case of a failure.  It can sometimes take minutes to patch around faulty equipment to restore live pictures.  Ethernet features protocols that enable redundant paths to be utilised within seconds of a failure.

Once an Ethernet backbone in television proves itself to be equal or superior to a circuit-switched system in ability and reliability, the deciding factor will be cost.  The core technologies inside equipment such as cameras, mixers and monitors will not change, and so for the most part, their costs will not change.  What will change is the cost of the “glue” that connects them all together, and the costs involved in rolling out that glue for broadcast facilities.  It is important to note that Ethernet and the Audio Video Bridging enhancements are open IEEE standards, and are not tied to any specific vendor.  Therefore, the core technologies involved will continue to follow the price curve of Ethernet, a commoditised technology.  Perhaps the most important saving is in time.  Equipment that takes less time to deploy has a shorter job-cycle time and can therefore do more jobs in a given time period, resulting in a quicker return on investment.

From this it is not much of a stretch to conclude that if the promise of Ethernet-based live television production is realised, broadcast companies that utilise it will have a significant competitive advantage over companies that continue to only use circuit-switched systems.  Even companies skeptical about the feasibility of Ethernet in television production should not ignore the idea completely.

Further Reading


[1] “10-Bit 4:2:2 Component and 4fsc Composite Digital Signals – Serial Digital Interface”, SMPTE 259M-1997 (Revision of 259M-1993), <>, Retrieved 26 January 2011 

[2] Communications Specialties, Inc., “HDTV Standards and Practices for Digital Broadcasting”, <>, Retrieved 26 January 2011 

[3] “10BASE-T”, In Wikipedia, <>, Retrieved 26 January 2011 

[4]  Winterford, B. 28 July 2009,  “Nextgen to wire sports stadiums with fibre network”, iTnews, Haymarket Media Pty Ltd,  <,nextgen-to-wire-sports-stadiums-with-fibre-network.aspx>, Retrieved 26 January 2011 [4] 

[5]  Nextgen Networks, “Nextgen Delivers Game Changing Content Delivery Network for Premier Media Group”, <>, Retrieved 26 January 2011

[6] Merli, J. 12 November 2009, “A Lossless Season”, TV Technology, NewBay Media LLC, <>, Retrieved 26 January 2011 

[7] E. Raja Simhan, “Cisco broadcasts HD videos images of the Commonwealth games”, The Hindu Business Line,   < >, Retrieved 23 June 2011

[8] “IPTV”, In Wikipedia, <>, Retrieved 26 January 2011 

[9] Cisco Systems Inc., “CCTV on IP Network”, <>, Retrieved 26 January 2011 

[10] Newman H, 18 June 2009, “Falling 10GbE Prices Spell Doom for Fibre Channel”, Enterprise Storage Forum, QuinStreet Inc., <>, Retrieved 26 January 2011 

[11] Gillette, G,  Kovac, R & Pelipenko, I, 31 July 2000, “Sizing up the Gigabit Ethernet switch players“, Network World,  <>, Retrieved 26 January 2011 

[12] Newman, D, 18 January 2010, “Latency and jitter: Cut-through design pays off for Arista, Blade“, Network World, <>, p. 2, Retrieved 26 January 2011 

[13] Grass Valley Group, “8943RDA SD/HD/3G Reclocking Distribution Amplifier”, <>, Retrieved 26 January 2011 

[14] Grass Valley Group, November 2008, “Kalypso HD/Duo Video Production Center Installation and Service Manual Software Version 15.1”, <>, Retrieved 26 January 2011

[15] Institute of Electrical and Electronics Engineers, “Audio/Video Bridging Task Group”, <>, Retrieved 26 January 2011 

[16] Teener, MJ, 24 August 2009, “No-excuses Audio/Video Networking: the Technology Behind AVnu”, AVnu Alliance, <>, Retrieved 26 January 2011 

[17] Edwards, T, 28 June 2010, “Uncompressed Video Over IP”, TV Technology, NewBay Media LLC, <>, Retrieved 26 January 2011 

[18] Lab X Technologies, “Titanium 411 Ruggedized AVB Ethernet Bridge“, <>, Retrieved 27 February 2011 

[19] BSS Audio, “Ethernet AVB Products”, <>, Retrieved 25 March 2011 

[20] EKF Elektronik GmbH, “CL1-COMBO 5+1Ports Gigabit Ethernet Switch”, <>, Retrieved 27 March 2011 

[21] Teener, MJ, “Time Awareness for Bridged LANs: IEEE 802.1 Audio Video Bridging”, Joint ITU-T/IEEE Workshop on The Future of Ethernet Transport,  <>, p. 6, 28 May 2010, Retrieved 27 June 2011 

[22] Institute of Electrical and Electronics Engineers, “802.1AS - Timing and Synchronization”, <>, Retrieved 26 January 2011 

[23] Eidson, J, 10 October 2005, “IEEE-1588 Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems”, Agilent Technologies, <>, Retrieved 26 January 2011 

[24] Skoog, P & Arnold, D, December 2004, “Nanosecond-Level Precision Timing Comes to Military Applications”, COTS Journal, <>, Retrieved 3 February 2011 

[25] Teener, MR & Macé, G, 1 June 2008, “Using Ethernet in the HD Studio”, Broadcast Engineering, Penton Media, Inc., <, Retrieved 05 May 2011 

[26] Palmer, DA, Fellman, RD & Moote, S, August 1 2001, “Path 1/Leitch TrueCircuit QoS technology”, Broadcast Engineering, Penton Media, Inc., <>, Retrieved 3 February 2011 

[27] “Virtual Circuit”, In Wikipedia, <>, Retrieved 3 February 2011 

[28] “802.1Qat – Stream Reservation Protocol”, Institute of Electrical and Electronics Engineers, <>, Retrieved 3 February 2011 

[29] Teener, MR & Macé, G, 1 May 2008, “Ethernet in the HD Studio”, Broadcast Engineering, Penton Media, Inc., <>, Retrieved 3 February 2011 

[30] Video Services Forum, “HBRMT Encapsulation and FEC Committee”, <>, Retrieved 3 February 2011 

[31] Van Beijnum, I, 7 March 2007, “Everything you need to know about IPv6”, Ars Technica, Condé Nast Digital, <>, p. 2, Retrieved 3 February 2011 

[32] Mitchell, B, “The MAC Address, An Introduction to MAC Addressing”,, The New York Times Company, <>, Retrieved 3 February 2011 

[33] Donzé, F, June 2004, “IPv6 Autoconfiguration”, The Internet Protocol Journal 7.2, Cisco Systems Inc.,<>, Retrieved 3 February 2011 

[34] Xilinx, Inc., 17 November 2010, “Xilinx Virtex-7 HT Devices Enable 100-400Gbps Applications and Beyond in a Single FPGA for Next Generation Communication Systems”, <>, Retrieved 3 February 2011

[35] EVS Broadcast Equipment, “XT2+”, <,+Middle+East+Africa/English/Products/Products-Catalog/Ingest-Production-and-Playout-Servers/XT-2-/page.aspx/2403>, Retrieved 27 February 2011 

[36] Schluting, C, 14 August 2007, “Networking 101: Understanding Spanning Tree”, Enterprise Networking Planet, QuinStreet Inc., <>, Retrieved 27 February 2011 

[37] Cisco Systems Inc., 17 April 2007, “Understandable Multiple Spanning Tree Protocol (802.1s)”, <>, Retrieved 27 February 2011 

[38] Cisco Systems Inc., 24 October 2006, “Understandable Rapid Spanning Tree Protocol (802.1w)”, <>, Retrieved 27 February 2011

[39] L-S-B Broadcast Technologies Gmbh., “VSM – Virtual Studio Manager”,  <>, Retrieved 27 February 2011 

[40] Billat, S, 1 Mar 2008, “Delivering Quality Real-Time Video Over IP”, Broadcast Engineering, Penton Media Inc., <>, Retrieved 27 February 2011 

[41], “HD SDI DA’s”, JP Claude, Inc., <>, Retrieved 27 February 2011 

[42] Sony Electronics Inc., “HDCU1000L  HD Camera Control Unit”, <>, Retrieved 21 May 2011 

[43], “Telecast Python 2 P2-TX8-13CW-ST Encoder and/or Decoder”, JP Claude, Inc., <>, Retrieved 27 February 2011 

[44], “Finisar FTLC1181RDNS Duplex CFP Transceiver”, <>, Retrieved 05 May 2011 

[45] Duffy, J, 26 February 2010, “Engineers Demand Price Drop for 100G Ethernet”, Techworld, International Data Group, <>, Retrieved 27 February 2011 

[46], “Finisar FTLQ8181EBLM Duplex CFP 40GBASE-SR4 40G Ethernet transceiver”, Quoted price $3420 – 100GBASE-SR10 could cost 10 x the price of 40GBASE-SR4, <>, Retrieved 05 May 2011 

[47], “Cisco 10GBASE-LR SFP Module Transceiver”, <>, Retrieved 27 February 2011 

[48] Solomon, H, 2 April 2008, “Racing Towards 10Gigabit Ethernet”, Techworld, International Data Group,  <>, p. 2,  Retrieved 27 February 2011

[49], “Cisco Nexus 5010 20-Port Ethernet Switch”, <>, Retrieved 27 February 2011

[50], “Cisco Catalyst 3560G 48-Port Gigabit Ethernet Switch (WS-C3560G-48PS-S)”, <>, Retrieved 27 February 2011

[51] Solomon, H, 2 April 2008, “Racing Towards 10Gigabit Ethernet”, Techworld, International Data Group, <>,  Retrieved 27 February 2011