By Quan Huynh
To get started on reverse engineering malware, it is essential to have a separate sandbox. I recommend using software for virtual machines such as VMware or VirtualBox as reverse engineering on your home computer can potentially expose you to dangerous malware.
There are two types of analysis when it comes to reverse engineering. Ideally you want to begin reverse engineering with static analysis, then move onto dynamic analysis. Static analysis allows you to pinpoint important parts of a binary and helps you decipher what it is trying to do. In some cases, static analysis will allow you to find temporary files that a binary can spawn, helping you to remediate the problem. Another important part of malware is persistence. This can include Windows registry keys and Linux crontabs. Some malwares will create its own processes and add itself to start up, helping it achieve persistence. Dynamic analysis is where you put it all together. When running the malware, you can find all the files that it has created and all the processes that it spawned. Dynamic analysis will also allow you to find network signatures from the attacker as malware such as spyware sends back information to a certain location. This means you may be able to track down the IP address that the information will be sent back to.
Usually, to start you would want to check if the malware has been detected using an online scanner such as TotalVirus, but since I already know it is a virus, I will skip this step.
To start off you want to check the strings within the malware to see if you can find any useful information in the binary. This step is important as it can reveal if a binary is malicious or not. Before you start reverse engineering malware you have find evidence that points to the binary being potentially malicious. This can be done in many ways as there are a couple of tools such as strings, PEview, Detect it Easy etc. Plugging this malware through strings you can find a couple of interesting things such as “Microsoft Visual C++ Runtime Library” which can be an indication that this malware was written in C++. You can also find a substantial number of functions that were used in this binary.
Next to give us a little information about the binary, I use Detect It Easy (DiE). DiE detects compliers and packers. You can see below that the binary was compiled using Microsoft Visual C/C++ confirming that the malware is written in C++. We can also see that the entry point or main function of the binary is 0x00401aa0 which will be useful when we get to using the disassembly. Though this one is not packed, you should always be on the lookout for VirtualAlloc and randomized unreadable strings, which can be give aways that the binary is packed. Though a binary can be packed or uses VirtualAlloc doesn’t always mean it is malicious, but it should raise suspicion.
Static Analysis – Disassembler
Before we jump into a disassembler, we must make note of the registers that are used as disassemblers turn human readable code such as python, C, C++ back into machine code, or assembly code. Registers include but are not limited to: EAX, ECX, EBX, EDX, ESI, EDI, ESP, and EBP. EAX for mathematic operations and return values of functions, ECX is your counter register, EBX points to your data, EDX is also for mathematic operations as well as input/output operations, ESI is a pointer to the source of the operation streams while EDI is for the destination, ESP points to the top of the stack, and finally EBX points to the base of the stack. You will see these registers very often when reverse engineering malware.
I will be using Ghidra for disassembly. Ghidra is a free open-source disassembler/decompiler that was created by the NSA for reverse engineering. It has different features that can assist you in reverse engineering.
To start up Ghidra, you can use the command ./ghidrarun on Mac and Linux machines while in Windows there is an executable called ghidrarun. When starting up Ghidra, you should see a window that looks like this.
You can create your own project and place binaries into one central folder. To being reverse engineering, you can create a new project then drag the file into an active project and click it. This should start up the disassembler/decompiler. Once it has started up, you should get a prompt asking if you want to start analyzing the binary, and you want to click then. It should bring up another prompt with options for analysis, I usually just go with the default.
After Ghidra finished auto analyzing it, you should see assembly code in the listing window along with other various windows. On the left, that is the decompiled code of the binary, but there are some variables that Ghidra cannot fully recognize. In the symbol tree, you can find valuable information such as imports, addresses of functions, and the entry point of the binary. The entry point of the binary is valuable as it can take you to the main function.
Under the Imports table in Symbol Tree, you can find the imported libraries that this binary is using. This is how the binary will be calling the Windows API. If you click the drop down, you can find the APIs that are used from that specific library. This is important for us to be able narrow down what the binary is trying to do on the system. Registry APIs should raise red flags that the binary is trying to create persistence in the system. There are many different registries that can help the malware stay on the system.
For a better view of all the APIs that the binary will be using, you can click on the top menu “Window” and then find “Symbol References.” This is a create window as not only will it give you the APIs, but also the references of where the API is being used.
Another great tool to use is under “Analysis.” This will have Ghidra compile all the strings so you can view it, like the strings tool that was used earlier. To look at the compiled strings, you can go to “Search” and then “For Strings.”
In the Symbol references window, you can click on an API, and it will show you all the references to that API. On the right side, you can see all the subroutines (functions) that are calling that API, so if you click on the subroutine, it will take you to the location of the call.
To start, I like to go back to the entry point of the binary. Then by clicking graph view, it will give a visual representation of what the code execution will look like
Clicking on LAB_00406163 it will take us to this function right here. In the picture below, you can see that it is pushing three things onto the stack: -0x1, DAT_00444b60 and LAB_00429560. Further inspection of DAT_00444b60 and LAB_00429560, they are both numerical values. Going down a little more, you can see that it is calling GetVersion.
Scrolling down a bit, there are two interesting function calls and Jump if not zero (JZN). This means there are conditionals in this function.
Inside FUN_0042912e, you see that the function is comparing DAT_00451471 with 1, and if they are equal, the 0 flag is set. The JNZ will jump to LAB_0042913c if they are not equal, skip FUN_00406797, call FUN_0041ae43 and then call ExitProcess, terminating itself. So, with that we know that the function FUN_0042912e checks for the version of Windows and exits if it is not the correct version.
Now to investigate FUN_00406797, and in here you can see there is another conditional
Scrolling down, you can see that binary is calling WriteFile after pushing parameter values onto the stack.
We can now infer that the binary is checking if a certain file exists, and if it does not, it will create another one. You can see that the binary calls GetModuleFileNameA if a certain conditional is met, and with another conditional pushing “<program_name_unknown>”. With all the information we found, can an infer that the type of virus could be a Trojan that tries and steal information.
Now that we have figured out the first functions, we can go back to main and look at the other functions. If we go down in the main function, we can see that the binary is calling GetStartupInfoA and GetModuleHandleA, giving us a little more idea of what the binary is trying to do.
We can keep traversing through the graph view and following the functions to find more function calls that can give us more clues of what the binary is trying to do.
Going into FUN_00406bc0 to investigate and we can see that the function is calling GetCommandLineW, GetVersionExA, and GetModuleHandleA giving more confirmation that the binary is trying to steal information. Now that we see that the binary is trying to get the command line, we have to be weary of it possibly be a botnet.
Now we know that it is trying to steal information, what we don’t know is where it is going to store it or how it will send back the information to the attacked. To try and find this information, we will try to use the “Search” tool that we used earlier. Inputting “Create” into the search box, we will let Ghidra find everything text that has the string “Create” in it. When following the references, it takes us to LAB_00416fdb, where you can see that there is a string for “heo9.tmp” and now we know one file, but we do not know the full path yet.
The binary also creates a window and using rich edit. Rich edit allows you to add links and save text.
Further searching through the strings, I was not able to find any possible paths, but what I did find intriguing was that the registry API functions were not found in the binary, meaning it can possibly be loaded at runtime.
To begin dynamic analysis, it is vital that this is done in a virtual machine. Doing dynamic analysis on your main machine will most likely infect your machine, so doing this on a virtual machine is advised. Before running the malware, remember to create a snapshot of your virtual machine as it will allow you to revert any damage that was caused by the malware. It is also advised to turn off your internet connection so the malware cannot spread. Since I was using MacOS, I switched to my Windows machine so I could properly run the malware.
Knowing that this malware has a possibility of connecting to a malicious site, you should start up apateDNS, which will allow you to route the traffic back to yourself. A couple more programs you want to start up are Process Monitor, Process explorer, and Regshot to take a snapshot of your registry keys. Make sure you take note of what was on your machine before the malware was run.
Once the malware was ran, you can see that the machine is trying to connect to ip.anysrc.net, which when looking it up returns it as a suspicious site from Germany.
You can see from ProcMon that the malware has created wininit.exe, spawning off multiple executables, and when you try to look at it, much of the information about the program is gone. From this information, we know now to look for the PID of ”wininit.exe” or the PID of its spawns in Process Explorer.
Now in Process Monitor, we can filter the parent PIDs to find the child PIDs of the malware
Now when looking at ProcMon, you can see that the malware is creating files in the Temp folder as well as opening and querying the registry keys.
For the attacker to receive the information, there is a UDP send operation, most likely sending everything back to the attacker.
Furthermore, it’s collecting information about the machine, and then locking it. When you try to jump to the file, it is now unlocked, and even administrator privileges cannot access the file.
When looking further into the registries that were set, you can find that the malware has added new values to the schedule registry.
After doing both static and dynamic analysis on the malware, you can get a pretty good sense of what the malware is trying to do. In this case, the malware is gathering information about the computer and its architecture, placing that information in different temp folders, and then sending it back to the attacker through a malicious site.