Running a Simple GPU Program using PyCUDA

By Nicole Holden

                Over the last couple years, I have written a few fuzzers for various class assignments. Each fuzzer has had different requirements and needed the ability to attack different apps. The first few were written in python because that’s what was required, and subsequent versions were written in python because that’s what was comfortable.

                This latest iteration required a fair bit of scraping for data and it was slow despite being threaded and using multiple CPU cores. I was only managing about 1500 complete requests per minute (each request had to be analyzed and the data I wanted pulled out of the response, then fed back into the program). I know this can be done faster, likely my python code wasn’t the most efficient it could be, and even better I could have written the program in something like C to gain even more benefits.

                Lately though, with crypto currency and big data, to me it seems like there is a lot of work and interest in leveraging GPU computing. There used to be a big barrier to get into GPU computing because you had to know C++ and often other languages. To make it easier, Nvidia released a developer package called CUDA (which you can find here). After reading about it a little bit it seems easier, but still not entirely simple. Linked on the download page is a post called “An Even Easier Introduction to CUDA” (here) but its written for C and C++ developers, which isn’t exactly what I was looking for. In pursuit of an easier solution I stumbled on a python packaged called PyCUDA.

                PyCUDA does a lot of handy things for us, aside from letting us program in python, it also handles CUDA exceptions for us, and still executes a lot of the backend in C++. This seems like a bonus, I would get to stay comfortably in python, but get many of the speed benefits of C++, score!

                Getting PyCUDA setup however turned into an event. The official PyCUDA documentation installation instructions are for Windows 7 using Visual Studio Professional 2008. Initially this was concerning, what if this package isn’t maintained anymore? However, looking through the git repository I found 39 releases between 2008 and September of this year, so it seems like the package is still being maintained and it’s still listed as a popular solution for anyone wanting to get into GPU programming in python.

                Unfortunately, since I needed this to work on my personal gaming computer, as it’s the only one with a Nvidia card, I needed to figure out how to get this to work with Windows 10 and Visual Studio Community 2017. There are no consolidated guides for this. The answers on how to get this to work do sort of exist, but they are spread out over many Stack Overflow posts, Microsoft posts and various other blog posts where people were frequently stuck getting this all up and running.

                Using the latest version of PyCharm (v2018.3), Visual Studio Community 2017, python 2.7 and CUDA 10.0 on Windows 10 using a Nvidia GTX 1070 graphics card, I was able to get a simple test program running. The process was long, and if you are interested, I have included all the snags I encountered, as well as the process (whether it worked or not) that I went through to fix it, after the installation guide. I do think sometimes it’s important to show that it can be difficult to get things working, and that’s okay if you don’t give up, which is why I’ve included the good, the bad, and the downright facepalm moments of my attempt at getting PyCUDA running.

PyCUDA Installation (Windows 10) Summary:

Note: This installation guide is my best guess at the minimum steps and components required to get PyCUDA running given the listed software. There may be components that aren’t needed, which I have indicated.

  1. Install python 2.7 and make sure it’s in your path (it should install there automatically but may require a restart). Later versions of python will not work as they throw memory access errors.
  2. Install PyCharm and create a new project, set the interpreter to python 2.7.
  3. Make sure PyCharm is using pip 9.03 and not a newer version, as later version of pip does not work well with PyCharm (this is a known and long-standing bug).
  • Install the latest version of Nvidia’s CUDA (this requires a NVidia graphics card). During the installation setup, make sure to uncheck the Nvidia GeForce experience, both driver boxes, and expand the CUDA options so you can deselect Visual Studio Integration. It should look like this:
  • Restart your computer. CUDA will have modified your PATH and set its own environmental variables so you should not need to worry about anything else relating to CUDA.
  • Using this link install the Microsoft Visual C++ Compiler for Python 2.7
  • Using this link install the Build Tools for Visual Studio 2017 then restart your computer (this step may not be needed, the compiler for Python 2.7 may cover it but I couldn’t verify):
  • Download Visual Studio 2017 here. I downloaded the community edition and it worked fine. Once that is installed you will see a window that looks something like this:

In this window you want to select the following:

  1. Python development
  2. Desktop development with C++. On the right there are arrows that let you expand/contract sections. Expand the section for Desktop development with C++ and select C++/CLI Support
  3. I also selected the IDE as someone in one of the help articles said it was required, though they never opened it. I do not know if this is the case, feel free to experiment.
  4. After everything downloads and installs, restart again.
  1. Navigate to this directory: C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64. If this directory doesn’t exist, you will need to determine where your cl.exe was installed. If this directory does exist add it to your PATH variable in your system environment, then restart your computer:
  1. Open PyCharm and your project. In Settings -> Project -> Project Interpreter click the green +. Search for the package NumPy and install it. Then search for the package PyCUDA and install it.
  2. PyCUDA will take a couple minutes to install, but if it installs successfully you can now use the latest version of PyCUDA to write GPU/CPU combo programs.

Getting the Test Program Working

The first step was to try and see if I could get something running on my GPU. To do that I found a (not so simple) simple tutorial with the following source code (which I have modified to work with Python 2.7-3.6):

 import NumPy
import pycuda.autoinit
import pycuda.driver as drive
from pycuda.compiler import SourceModule
# SourceModule compiles C code for CUDA
mod = SourceModule("""
global void multiply_them(float *dest, float *a, float *b)
const int i = threadIdx.x; dest[i] = a[i] * b[i];
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
# Perform the computation
multiply_them( drive.Out(dest), drive.In(a), drive.In(b), block=(400,1,1), grid=(1,1))

                After copying that code, I know I need to get a few libraries. I am using PyCharm as my IDE and it always gives me problems when I try to use its built-in repository search and install, so I did a quick manual install of the packages I need right in the terminal in the bottom of PyCharm. The first time I tried to install PyCUDA I got a dependency error, apparently PyCUDA relies on NumPy. This is no big deal; a quick manual install of NumPy looks like this:

Then I tried to install PyCUDA again and got another error:

Navigating to Microsoft website and downloading Microsoft Visual C++ Redistributable under Other Tools and Frameworks should solve that problem:

Well, it turns out this still doesn’t work and a lot of the components of PyCUDA still won’t install. After a little digging it turns out it’s not the redistributable that’s needed but the build tools which I finally found here.

                I’ve had terrible luck with Visual Studio Installer in the past and was trying to avoid it but this time everything worked out. Unfortunately, now we have some new errors when trying to install PyCUDA. Since following the dependency tree wasn’t working, I decided to try and find a few different install guides online. Unfortunately, most of these are old, for old versions of python, and many of them just can’t agree on what versions of things to install. Even the wiki linked by the PyCUDA documentation hasn’t had its windows install updated since Windows 7. There is now no guarantee that I will be able to get PyCUDA working at all, but it’s still worth a shot.

                To try and get things working I decided  to install things they all tended to agree on. Nvidia’s CUDA is required so I got that here. Visual Studios in general is required (some tutorials said you just needed the build tools and some guides said you needed the entire thing) so I downloaded the IDE portion and some of the workload options for C++ and Python development.

                Unfortunately, I am even hitting snags here. The CUDA installation fails with some unspecified error. A quick search reveals a potential solution which sounds like the problem I am having. Since this is my personal gaming PC, I already have a few of NVidia’s tools installed, including drivers for my current card, and the GeForce Experience.

I made some progress, it’s still failing, but this time I can see where:

                It looks like (just like the individual in the solution I looked up) I am having issues with Visual Studio Integration. I am a bit frustrated at this point. The CUDA install alone takes almost five minutes per attempt, and so far, nothing is working. Visual Studio is listed as required for PyCUDA, so I know I will need it (and thus don’t want to uninstall it). There are also still a handful of issues reported during my PyCUDA install via pip that I haven’t gotten to yet because I am still fighting with basic components.

                Finally! After a quick restart I can now confirm that CUDA is installed and it has modified my environmental variables for me, so I don’t have to:

Now it’s time to try one more time to install PyCUDA manually in my PyCharm terminal:

Success! Now to try and run the sample code:

                Well, nothing else has worked out easily so I’m not totally surprised. A quick search of that specific exit code yields are few posts about some other technology that I am not currently using. However, they do all have one thing in common, attempting to perform GPU computing. Even though we aren’t quite up and running yet, this is a good sign!

                I tried a few more tutorial code samples but always received the same exit code. This error code implies some type of access violation, meaning there is an issue accessing memory on the stack/heap. PyCUDA is basically a wrapper around some C++ code, and memory issues are classic C issues. I do recall that it was suggested python 2.7 be used, but I am currently on 3.6. This is easy enough to fix in PyCharm.

                Changing to python 2.7 means I must reinstall NumPy and PyCUDA, I thought this would be simple (it sorts of wasn’t). I needed a way to use an older version of pip then what I had installed, to install an older version of modules that I already had installed but keep both (ugh). The solution was this command: pie -2.7 -m pip install NumPy (and PyCUDA). Unfortunately, when going through all of this, I now need a different version of Microsoft visual C++ tools. Pip gives me a nice link to this download.           Luckily the fight wasn’t as long or hard, this download link worked, and I was able to get PyCUDA installed again for the older version of python.

                I learned along the way that part of my PyCharm issues have to do with a specific version of pip. This wasn’t easy to fix in PyCharm, I ended up having to manually download pip 9.03 and replace the folder in my venv folder that held pip, I then had to edit my pip-selfcheck.json to specify which version it was. I discovered this on my own while waiting for installs of various components, and after a quick test was able to install packages and tools again in PyCharm like normal.

                Running the code again gives me another (of course) new error:

                At least this isn’t an access error again, and I do recall seeing something about nvcc being added to path both in one of the tutorials (they didn’t all include this, so I had no idea if it was necessary or not), and in an install error during one of the failed PyCUDA installs. Luckily this error was easy to search, and provided a simple solution here.

                Even this isn’t as simple as it should have been. Apparently (after some searching) Visual Studio 2015 and later doesn’t install cl.exe by default. So, I had to go back and modify the install like so:

                Since this requires a path modification I also had to restart again, though not always discussed, I restarted almost every time a component needed to be installed by anything other than pip and PyCharm, because almost everything is modifying environmental variables in some way. After I got this installed, I verified it was working by opening a Developer Command Prompt because I couldn’t find cl.exe.

                So, I know I have it, I just need to find it. Turns out it’s here: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx86\x86. Just adding this to my PATH didn’t work. I found another guide here that suggested there was a script I could run that came with Visual Studio.

                I was able to find this and run it, another computer restart and still no luck. I have no idea what to do at this point because despite being able to find cl.exe, I can’t get windows to accept it as an environmental variable.

                After a couple hours of searching I have come to realize that this is an incredibly common, frustrating, and often not solved error.

[Some Time Later]

                After many attempts, and many frustrating hours, I decided to try one more time to add cl.exe to my path. Just in case, I went in and re-navigated to the location I found earlier, copied the path, and put it into my path variable. I then restarted, and ran the program, voila:

                This is the expected output of the program (minus that warning, which for the moment I am going to ignore). The actual path variable I needed was C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64.

                It turns out, even though many of the components I am using are 32 bits, because the CUDA version I have installed is 64, that’s that version of cl.exe that I need. The problem is, using the guides and using Visual Studio 2017 instead (which there are no guides for), there are 4 versions of cl.exe available. On a whim I tried to set my path variable to a different version, and the program compiled but generated some interesting errors that hinted at an architecture issue. So, I went back and sure enough, the first path variable I was trying to install was the x86/x86 version.


                Now that I have a working version of PyCUDA I would want to do a little research into the possibility of moving some components of a fuzzer to the GPU to try and speed things up. Initially I thought this would be relatively simple but I don’t get that impression anymore. As a future project I may attempt to get something simple running.

                Originally I wanted to know if I could use my GPU to open network connections and take advantage of the ability to leverage many more simultanious connections to speed things up. For those interested I found an interesting article here called GPUnet, written by a team from the University of Texas at Austin, that highlights some of the challenges of using a GPU for network systems and how it might be done. This may be an interesting future research topic worth pursuing when I have time again.

Additional Sources (not linked in the post):

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s