By Kaitlin Keenan
While traditional cryptographic hashing algorithms, such as Message Digest 5, or MD5, or Secure Hash Algorithm 1, or SHA-1, are useful when finding identical items, they are not well-suited for finding similar items. This is where the concept of Context Triggered Piecewise Hashing, or CTPH, comes into play. This concept, which is also known as Fuzzy Hashing, is a process that compares two files that are nearly identical in content. This comparison is used to assign a level of similarity between the two files. This level of similarity can further aid forensic investigators in finding potentially incriminating digital items, such as copyrighted materials and various types of malware, that may have been only slightly altered from their original counterpart.
Although the fuzzy hashing procedure is a positive addition to the security community, an individual may still wish to protect their digital files from being discovered by such an effective technique. While there are a number of tools available that perform fuzzy hashing, such as ssdeep and VirusTotal, there is a clear lack of tools available that perform anti-fuzzy hashing measures. This is where antifuzz comes in.
Antifuzz is a proof of concept command line utility, which is written in Python, that aims to thwart the fuzzy hashing technique. The tool, which is shared under the MIT License, works by taking in a user provided Motion Picture Experts Group Layer-3 Audio or MP3, file and then slightly modifies said file by changing the volume of the file by a scale of one. This change in volume is done with the help of LAME Ain’t an MP3 Encoder, or LAME, which is a program that can be used to create MP3 audio files. The user can choose whether to overwrite the original file or save the modified one as a new, separate file. While it may be more beneficial to only have one of these files on a machine, a user may wish to save both files for testing and the initial comparison process. In addition to using LAME to change the file’s volume, the user also has the option to edit some of the file’s metadata. This change is automatically and randomly performed by antifuzz and can be used as an additional method of fuzzy hashing protection. While this modification process only needs to be run once in order to produce an effective result, it is possible to run the program on the same file more than once to achieve different hashes while still receiving the desired level of dissimilarity between the two files. Once this slight modification has been made, antifuzz then uses ssdeep, which is a program that creates and compares hashes via the fuzzy hashing technique. Ssdeep compares the hash of the original file to the hash of the new, slightly modified one in order to determine a percentage of similarity between the two files.
This tool, whose Github link will be provided at the end of this post, returns that, based on their hashes, there is a zero percent level of similarity between the two files. While the user will know that the file has been ever so slightly modified, they will not be able to notice a significant difference when they play back the new MP3 file, as it will sound identical to the original file. If an individual is interested in sampling the results of the final product before testing it out on their own, the project’s Github page includes an example MP3 file and its slightly modified counterpart.
Unfortunately, there are a few drawbacks to the antifuzz tool. The first and most obvious of these drawbacks is that this tool currently only supports MP3 files, which obviously makes it a fairly useless tool when attempting to protect other file types. Hopefully, future versions of antifuzz will support other desired file types, such as text, image, and/or video files. Furthermore, it is not recommended that the program be run on a file more than ten times, as it appears that some noticeable distortion to the file’s audio quality will be present at and beyond that point. This is shown by an MP3 file included in the project’s Github page that has had the program run against it ten times.
While antifuzz is a great first step in the realm of thwarting the fuzzy hashing technique, it is far from being a complete and robust solution, which is why it is merely advertised as a proof of concept tool. Because the only modification option is to use LAME to change the file’s volume by a scale of one, there is the potential to reverse engineer the process to prove that the files are, in fact, the same after all. Hopefully, future versions of antifuzz will include several different modification options that are able to achieve the same result more subtly in order to avoid detection. Furthermore, antifuzz only utilizes ssdeep for the final comparison and similarity level process. Forensic investigators will not limit themselves to only one tool, so it would be most beneficial to test against other fuzzy hashing tools, such as VirusTotal, in order to determine just how effective antifuzz has the potential to be.
In summary, traditional cryptographic hashing algorithms, such as MD5 and SHA-1, are great for finding identical items but not ones that a merely similar. CTPH, which is also known as Fuzzy Hashing, performs a process that allows for the detection of items whose content may only vary ever so slightly. Tools such as ssdeep and VirusTotal perform fuzzy hashing and can aid forensic investigators in finding potentially incriminating digital evidence. Unfortunately, there is a lack of tools available to thwart the fuzzy hashing technique, which is where antifuzz comes in. Antifuzz is a proof of concept command line utility that slightly modifies a given MP3 file in order to make its hash appear completely different from that of the original file. Although ssdeep finds the files to be completely dissimilar, the user will not notice a significant change to the audio file upon playback. While antifuzz is a great start, it has its own issues and drawbacks that prevent it from being a complete and robust anti-forensics solution. Hopefully, future versions of antifuzz will include improvements that will make it possible for the tool to move beyond the proof of concept stage.