By Pete Merges
Dante is a set of protocols that allow for raw uncompressed PCM audio to be sent over ethernet frames. This protocol is typically deployed in networks specific to audio equipment, as a smaller deployment allows for 1ms latency in a network with 10 switches in it. This protocol is also routable, but often isn’t due to the difficulty of implementing a clock protocol when additional network latency is introduced.
Dante utilizes a clocking protocol called PTP, standardized under IEEE-1588. This allows a clock source, referred to as the grandmaster clock to synchronize with other computers at the sub-microsecond level. This cuts down on clock jitter in the resultant audio stream, which is a common issue with digital audio devices, especially those communicating over a higher-latency medium such as Ethernet. This protocol, according to “A Security Analysis of the Precise Time Protocol (Short Paper)” has security issues if an attacker gained physical network access, effectively causing a DoS condition or specifically targeting devices with forged clock messages, leading them to fall out of sync. For this post, I didn’t delve into attacking this protocol, but I felt that this was definitely worth looking into as an attack surface.
Dante’s audio transport method is unicasting or multicasting raw PCM across the wire in UDP datagrams. Each datagram in the case I tested, which was 2 channels of 24 bit 48 kHz audio, was 369 bytes, 9 bytes of header and 360 bytes of PCM audio over UDP port 14341.
The header contains a 1 byte field containing the number of channels and an 8-byte field of the timestamp of the start of the frame. This is a set amount later than the time at the host to allow for controllable latency. From this data, I was able to get the data to import into audacity, a free audio tool as PCM in the correct 48khz/24bit stereo format. A screenshot of that process is below.
Sending data as raw PCM is certainly efficient and allows for low latency but this really shows how deploying networks with other traffic is very potentially a bad idea. For larger shows, often the only Dante traffic being sent is a copy of what is on the speakers that people can hear and this isn’t typically a problem. This becomes a problem when this protocol is used for more distributed A/V setups in conference centers where confidential information is being said into the Dante network.
Another attack leveraging this plaintext attack could be an injection attack, as there’s no authentication between the device sending audio and the one receiving audio. A device would only need to capture the host’s timestamp values to spoof later datagrams and inject an audio source of their choosing. This is potentially troublesome on devices that aren’t easily accessible, such as self-powered speakers that are flown in a venue above the ground. The attacker would also have to forward all the PTP datagrams to the client to ensure that it’s still synchronized with the grandmaster clock and doesn’t go into shutdown mode.
This attack, due to timing concerns would have to be implemented at a low level using something like linux’s DPDK, or Data Plane Development Kit. The attack would consist of the attacker PC forwarding packets except for those containing audio data. The new audio data would be inserted into the frame in raw PCM format, such as is contained in a wave audio file. This would have to be sent using the same number of channels, sample rate, and bit depth for it to sound correct. Thankfully, the Dante spec only allows for a few combinations with identical frame sizes. Possible sample rates are 44.1kHz, 48kHz, 96kHz, and 192kHz, possible numbers of channels are usually multiples of 2 with a max of 8, and the bit depth is always 24 or 32 bits. The number of channels is listed in the frame, so an attacker, in order to determine the format, only has to guess the bit depth, once that’s correct a short sample will sound pitched up or down from normal and you can determine the sample rate by guessing and checking using your ear.
Overall, this process showed me that Dante as a protocol is pretty vulnerable if an attacker can gain access to the network, but most of these attacks either cause a disruption rather than an injection or loss of data. This protocol definitely focuses on low latency over security and authentication, and for audio that just might be OK depending on network setup, as encrypting and authenticating devices would create a lot more computational overhead than is reasonable to work into audio and video devices, as audio processing is often most of what their processing and memory capability is oriented towards.