Introduction to PDF Analysis

By Corinne Smith –

Receiving malicious PDFs are common inside the corporate world. For example, an attacker will try to pose as a service provider to the business and will send a malicious PDF named “Invoice” to the Accounts Payable department. This file, when opened triggers some form of action that leads to the attacker gaining access to the employee’s computer. What happens next is that once the attack is detected, the security team needs to analyze the malicious PDF to determine what it’s impact was to see if the attack was successful or not.

NOTE: This blog post is by no means a be-all end-all to analyzing PDFs, but rather a stepping stone to start the process that can be molded to fit the requirements of the current situation.

WARNING: It is always important to note that when working with potentially malicious objects it should be done in a controlled environment (preferably a virtual machine with a snapshot to roll back to). Always be aware that when inspecting a malicious object there is the potential that your system could get infected too, and precautions should be taken against that.

To practice I have used make-pdf-javascript.py from Didier Steven’s PDF Tools make-pdf(2), which creates a PDF file with embedded JavaScript that will open an alert box when the PDF is opened in a browser.

Step One: Recon

The first step of any analysis is reconnaissance. So, to start it is useful to use existing tools which help you get a better view of what the file structure looks like and possibly what it is trying to do. One such tool is PDFiD (found here 2). This tool takes a PDF file for an argument and searches for specific keyword strings which may indicate malicious content inside the PDF. It searches the PDF for JavaScript and RichMedia (Flash), actions that trigger when the page is viewed, as well as encryption, and several other strings. Another good tool VirusTotal (VT), can be used to see what antivirus vendors think about the file, if the file has been seen before, and the file details that may shed some light on what is contained within the file. One thing that I noticed when trying a test file out in VT is that it makes use of the PDFiD tool and provides the same information that running the tool itself would. The differences between the two can be seen below.

vt

Virus Total Results

pdfid

PDFiD results

Even if the tools you are using don’t turn up anything, don’t just assume the file is safe. For example, PDFiD only searches for eighteen of the possible keyword strings a PDF can contain. If you are unsatisfied with the results from your tools (maybe because you already know that the file is malicious), you can always take the time to write your own scripts to do a deeper dive on the file you are inspecting. A list of over a thousand PDF keyword strings that you can draw from can be found here(1).

Step Two: Inspection

Once you have completed your recon, the next step is to take the information collected about the file and do further analysis on the malicious PDF. There are many tools that exist online that will help you with data extraction and inspection. Lenny Zeltser’s page (4) is a great launch pad for some example tools.

Using the PDFiD program on our example file uncovered that there were /JavaScript and /OpenAction within this file. Those two things together are usually considered a red flag and bears some further investigating. PDFiD was able to identify that there is JavaScript in our example file, so the next step is to extract the JavaScript from the file and inspect it. When searching for a tool to extract the JavaScript from a PDF there appear to be many tools that exist, however most have been discontinued or have disappeared. Your best bet is using peepdf (3) and the JavaScript commands within the peepdf package.
app.alert({cMsg: 'Hello from PDF JavaScript', cTitle: 'Testing PDF JavaScript', nIcon: 3});

Once we run the file through the pdf tool we can see that the JavaScript is just creating an alert box. However, oftentimes it will not be this easy. Attackers might base64 encode the Javascript to get it past malware analysis tools, and that’s one more step you have to take to decode what is happening.

Step Three: Remediation

Although this step is outside the scope of this blog post, it is still a very important step to consider. You need to take the details that you discovered in step two and figure out what steps need to be taken to resolve the issue. Are there holes in the security system that the attacker exploited? Is there a piece of out of date software that was leveraged in this attack that can be updated to prevent it from happening again? Is this something that can be avoided in the future by having employees be more aware of what a fishing email looks like? These things and more all need to be considered when remediating any security related incident.

Resources

  1. https://github.com/mirrorer/afl/blob/master/dictionaries/pdf.dict
  2. https://blog.didierstevens.com/programs/pdf-tools/
  3. http://eternal-todo.com/tools/peepdf-pdf-analysis-tool
  4. https://zeltser.com/analyzing-malicious-documents/
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s