Covert Communications Through Music

By Alex McHugh


In the area of computing security, covert channels provide a means whereby information can be transferred via something that is not supposed to allow communication. One classic example of a covert channel is steganography, the act of hiding information in plain sight. Today I will be discussing a python program I wrote that will take in a wav file and modify it in such a way that it contains a hidden message, but sounds unchanged. This program would allow for two parties to send a message to each other that couldn’t be interpreted by anyone who intercepted the message. If one party hid a message and then emailed the other individual with the song file, and a message body of “check out this song”, nobody would bat an eye, and there is most likely no security that would block this message being transmitted.

Makeup of a wav file

The most common format of a wav file is uncompressed audio in the linear pulse code modulation (LPCM) format. LPCM is the standard audio coding format storing two-channels (in stereo) which is sampled 44,100 times per second (44,100 Hz). Each sample is 16 bits so the possible values range from -32,768 to 32,768. I took advantage of this in my program knowing that if a single value were to be modified it would be extremely difficult to detect, and with 44,100 samples per second there was a lot of opportunity to spread out where the modified samples were.

When running a script that prints out the values in every frame of the song the output looks like this:


Because the song is made up of audio waves the integer values of the pitch are constantly increasing or decreasing. In the picture, you can see that this is a part of the song where the values are on the decline. This is another important factor I considered when deciding how I would encode a message. Someone doing analysis on the song might notice if there was one value that didn’t match the current trend with the soundwave, and discover it had been altered.

Building the Encoder

The full code for this project can be found at

After doing a small amount of research I discovered that Python has a built-in wave library that allows you to read from and write to .wav files.

def encodeSong(message):  
    # open the song files  
    noise_output ='output.wav', 'w')  
    noise_output.setparams((2, 2, 44100, 0, 'NONE', 'not compressed'))  
    song ="input.wav", 'r')  
    messagePointer = 0  
    messageLength = len(message)  
    print song.getnframes()  
    # loop through the frames of the song  
    for i in range(song.getnframes()):  
        data = song.readframes(1)  
        values = struct.unpack('<hh', data)  
        print values  
        # write values to the song if we are on one of the 22050th frames  
        if i%22050 == 0 and messagePointer != messageLength:  
            writeToSong(noise_output, values[0], values[1])  
            data = song.readframes(1)  
            data = song.readframes(1)  
            i += 2  
            # get the values of the frame 2 away and take the average  
            newvalues = struct.unpack('<hh', data)  
            leftValue = (newvalues[0] + values[0])/2  
            rightValue = (newvalues[1] + values[1])/2  
            dig = message[messagePointer]  
            # increment value by one if we are writing a one  
            if dig == '1':  
                leftValue += 1  
            messagePointer += 1  
                dig = message[messagePointer]  
                if dig == '1':  
                    rightValue += 1  
                messagePointer += 1  
            writeToSong(noise_output, leftValue, rightValue)  
            writeToSong(noise_output, newvalues[0], newvalues[1])  
            writeToSong(noise_output, values[0], values[1])  
    # close the song file  


A minimum of 5 bits are required to represent all 26 letters of the alphabet. While python does have a built-in library to convert ascii characters to binary, this library uses 8 bits. To be as concise as possible to allow for a longer message to be sent, I created a dictionary to map the letters A through Z and space as 5 bits. The program starts by prompting the user for a message, and then calls the encodeSong function to encode the message in the song. The encodeSong function opens both the input and output song files and loops through all the frames of the input song. The method I used for hiding the message was to spread out the altered frames as much as possible while still allowing for a decent sized message to be hidden. I settled on altering two frames every second of the song. Doing this allows for 4 bits of the message to be placed in the song every second since bits can be placed in both the left and right inputs. To do this, the frame number mod 22050 = 0 is checked since there are 44100 frames per second. If the frame is one that will be altered, the values of the current frame and the frame that is two frames later in the song are averaged. The code then checks to see if the current bit of the “binarified” message is a 0 or a 1. If the value is a zero, the average value isn’t changed at all and if the value is a 1, the averaged value is increased by one. The altered values are then written to the output wav file. This process is repeated until all the frames of the input song have been read in.


The process for decoding the song to find the secret message is just the exact same as encoding but reversed. The decode program goes through the song and stops of every 22050 frames and averages that value with the value two frames later. If the averaged value matches the value of the frame in between, it knows that a 0 has been written, and if they do not match, it knows that a 1 has been written. The program loops through until it reaches 3 spaces in a row (15 bits of 0), and then checks the binary to ascii dictionary to get the plaintext message.

Learned lessons and Improvements

Doing this project revealed how sensitive .wav files are. While figuring out what the optimal method would be for altering the frames, I discovered that it was going to be a Goldilocks situation. If the frames that were edited were too close together there was a crackling sound when listening to the output song which made it clear that the audio file had been tampered with. If the frames were edited too far apart though, it resulted in less of a message being able to be hidden in the song.

One of my other attempts at hiding the message in the song involved simply replacing the frame with a value of zero or one. Because the file is made up of soundwaves the values of the audio are constantly increasing or decreasing. Placing a value of a one or a zero breaks the soundwave causing the song to sound changed. If someone was doing analysis on the song and saw a value of for example (0,0) in between two other values around 20,000, they might believe that something had been done to alter the song. Therefore, I decided to use the averaging values method to make the altered values go along with the wave, and make it difficult to pick up.

While writing this post I realized that I could improve the hiding method by not changing the left and right input values in the same frame. I could have spread out the values less and only altered either the left or right value in the current frame.

Conclusion and Similar Work

During my research for this project I came across a similar application called mp3Stego. This eleven year old application does what my program does taking in a message and a song and hiding the message in an output file of the song. The program slightly improves upon my method by compressing and encrypting the input data before hiding it.

While the program I wrote works with taking in an ascii message and converting it to binary, anything could be hidden in the song without changing how it sounds. If the method used to hide the message in the audio doesn’t change the sound, it would be extremely difficult to detect that the song contains something of interest.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s