Ethernet and streaming


After reading many interesting posts on Ethernet cables and switches I thought it would be good to describe how Ethernet and other networking protocols are used in streaming services. This post does not cover the use of USB, AES or Coax cables. All the words used are factual and taken from other authors. Please feel free to correct any factual inaccuracies. The following topics are covered:

  • Streaming Architecture

  • Media Server

  • Media Renderer

  • Ethernet Data Transmission

  • Ethernet Cable, Noise and Differential Signaling

  • Ethernet Clock

  • Ethernet Switch

  • Summary

128x128welcher

Streaming Architecture:

Network audio is based upon error free bulk data transfer. It incorporates the same/similar technology that are used for rendering the web page you are currently reading. Qobuz, Tidal, Roon and UPnP/DLNA have similar architectures. They all use Ethernet, IP and TCP protocols in the same manner. They differ in their use of the upper level protocols. Since Qobuz, Tidal and Roon are closed architectures we will use UPnP/DLNA as an example.

Basically a media server provides media discovery and media transportation to a media render which converts a binary audio file to analog audio. The following diagram depicts the UpnP / DLNA architecture and protocol suite


 


 


 


 


 


 


 


 

Media Server:

In this example we will transport Jennifer Warnes First We take Manhattan from the media server to the media render. The file is in 96/24 PCM uncompressed format and approximately 133 mega bytes in length. For simplicity we will ignore the upper level protocols and concentrate on Ethernet, IP and TCP.

The media server uses the TCP process to divide the file into TCP packets. The TCP layer forwards the packets to the IP process which encapsulates a TCP packet into an IP packet. The IP process forwards the IP packet to the Ethernet process which encapsulate the IP packet into an Ethernet frame and transmit the frame on an Ethernet cable.

It is common to limit a TCP/IP packet size so that it fits in one Ethernet frame. Doing this will require more than 91,000 frames to transmit the file.

Media Render:

We will now examine the render's processing in a little more detail. The renderer contains a CPU, system memory (RAM) and a system bus. It will also contain a CPU clock which is used to synchronize operations (executing CPU instructions, moving data to and from memory, moving data onto a system bus).

Each process element (TCP, IP and Ethernet) have a designated space in system memory to perform their work. When data is transferred from one processing element to another the data will be copied/moved from one location in system memory to another location. We will refer to this movement of data between the process elements as streaming.

The Ethernet process creates an Ethernet frame from the electrical signals on the cable. Once a frame has been identified the checksum is then verified. If any of the bits are received in error the entire frame is discarded. The contents of the Ethernet frame are then streamed to the IP process.

The IP process extracts an IP packet from the Ethernet frame. If the packet is valid and the destination address matches the renderer address a TCP packet is extracted from the IP packet and streamed to the TCP process.

The TCP process verifies the packet checksum to ensure that data has not been corrupted. If the checksum is not verified the packet is discarded. The packet sequence number is compared to the next expected sequence number. If the numbers match the contents of the packet are streamed to the next process in the process chain and the sequence number is acknowledged to the media server. If the received sequence number is less than the expected sequence number the packet is discarded. If the received sequence number is greater than the expected sequence number the packet can be discarded or saved for later processing. The media server will automatically retransmit packets which are not acknowledged within a defined time limit.

After enough data has been accumulated in system memory the process of generating an analog signal from the digital data may begin.


 

Ethernet data transmission:

Ethernet cables transmit binary data. The only two values that can be transmitted are 0 and 1. The transmitter converts a bit value into a voltage value for transmission on the wire (modulation). The receiver converts the voltage value into a bit value at the other end (demodulation). The sending and receiving station agree on a clock rate, also known as frequency, which determines how long each ‘instance’ of voltage must be applied. This process of mapping voltages to and from bits is known as line coding. There are several standards for line coding. For this example we will use 4B5B / MLT-3. (100Mb Ethernet)

4B5B is a line coding used to create a data stream that is a self clocking. 4B5B maps groups of 4 bits of data onto groups of 5 bits for transmission. These 5-bit words are defined in a dictionary and they are chosen to ensure that there will be sufficient transitions in the line state to produce a self-clocking signal. The receiver then maps the 5 received bits into 4 data bits. The following table shows the mapping of data bits to transmitted bits and vice versa.


 


 


 


 


 


 


 


 


 

Once the bits to be transmitted have been created they are converted to voltage using the MLT-3 coding. MLT-3 cycles through three voltage levels -1, 0, +1. It alternates from -1 to 0 to +1, back to 0, then back to -1, repeating indefinitely. The transmitter moves to the next voltage level in the cycle to transmit a one bit and stays at the same level to transmit a zero bit. The following table shows an example MLT3 transfer:

The receiving station uses MLT-3 to generate a bit from a line voltage and 4B5B to generate data bits from transmitted bits.