Taking a Look at Jupyter Security

By Niklas Bernardo Correa

What is Jupyter?

Jupyter Notebook is an open-source web application geared toward writing and sharing “computational documents” and provide the user with an “interactive computational environment” in which to explore the document. Picture a Google Doc that also runs any line of code you provide it with and displays the output in the document. That is, in very rough terms, a Jupyter Notebook.

Notebooks have have become increasingly popular ever since they came into the scene in 2014 out of the IPython project (which was essentially the same thing) and are the go-to tool used for development and research in the Data Science field today. It allows data scientists to hack away at their data sets while developing their machine learning models in an easy and streamlined way. Not only is data science an expanding field, which means Jupyter and Jupyter-likes are bound to become even more popular as well, the idea of the computational document is very interesting, and it is likely to spread beyond just data science.

Why bother?

Why would a offensive security minded individual care about notebooks? I was motivated to look into them for the following reasons:

Its popular, but not that popular

As mentioned previously, Jupyter is the de-facto tool for prototyping in data science, and the underlying technology can even make into production in some way or another sometimes. So it is definitely out there to be taken advantage of (as we will see later). But at the same time, there has not been much said about Jupyter security outside of users and the developers themselves. Looking up “Jupyter security” returns results mostly from the Jupyter project’s own docs on proper security configuration. Just last year they started a Github repo that aggregates and annotates anything related to Jupyter security. The team is clearly on top of it, but there does not seem to be much interest from anyone in the security space. I thought this state of affairs might provide a chance to do something new and useful.

Code execution as a service

Jupyter offers a platform for users to experiment with code. To execute arbitrary code, one only needs to authenticate (if even that) and open a notebook to write in. Jupyter also offers a terminal through which to access the underlying host the server is running on. Sounds like a great convenience for users and very attractive to attackers.

Data

Where there is a Notebook, there is likely to be a lot of data. By gaining access to the Notebook server, one automatically also gains access the data, whether it be sitting there in the file-system somewhere or hosted elsewhere in the victim’s environment. The data could consist of credit card transactions, which potentially means there are full card details, or maybe HTTP requests to/from a web app that could contain session tokens. A lot of possibilities exists.

Also, there could be a few notebooks that describe a machine learning algorithm that interprets the data sitting around. I do not know much about machine learning, if I was handed a notebook with a some fancy algorithm I would not know what to do with it, honestly. But they do pay data scientists a lot of money to make them, so one that works should be very valuable to someone.

Computing power

Analyzing large data sets requires more than just algorithms, it requires storage space and CPUs. If a Jupyter Server is being used in a professional setting it most likely has significant resources allocated toward it. This makes Jupyter Servers interesting targets for cryptominers or anyone in need of a beefy server.

The Jupyter use case

How Jupyter is typically used also makes it particularly interesting. Notebooks are most commonly found in the early stages of a project where analysts are poking at their dataset and trying out approaches to tame it. Maybe there isn’t even a project to speak of yet. The flexibilty provided by the notebook is what makes it the tool of choice for this scenario. This stage also needs to be as frictionless as possible, which most likely means the security around it is pretty lax.

Picture a data analyst hacking at a sample data set in his own laptop, and comes up with something interesting. He needs to try it out with more data, but his laptop is kind of slow and the hard drive is sort of full, so he spins up an EC2 instance in his own AWS account (it has no 2FA enabled and uses the same password he uses for everything else, but its fine). He downloads more of the data set to the instance (its not such a good idea but he will delete it afterwards) and installs jupyter on it. He is at the point where he should get some input from his colleagues, so he exposes the server over plain HTTP to the internet, and changes the default token authentication to password authentication. The password is ‘password’.

How many are out there?

To get a lay of the land, I went on Censys and typed “Jupyter” into the search bar. Not particularly accurate, yes, but right off the bat yielded results:

The search returns around 28 thousand hosts. Default port 8888 or common HTTP ports mostly and cloud hosting provider IP addresses. Visiting a few manually shows mostly login pages asking for a password. Further scanning confirmed this. So people are changing the default token authentication to passwords, but not completely disabling authentication. Good for them.

But it is most interesting to find the cases where there is no authentication enabled at all. I spun up a Jupyter Server of my own and disabled authentication by setting a blank password and observed the behavior. It responds with HTTP 302 and redirects the browser to /tree. I looked in the response body for something that was unique to the authenticated section of the notebook. There are a few things, but I decided to go with the data-server-root attribute of the HTML body tag. I looked for unauthenticated servers caught by Censys using the query below:

services.http.request.uri: `\/tree?` and services.http.response.body: "data-server-root"

There aren’t many that Censys was able to find, but there are a few:

And it is possible to simply access them and use the terminal feature to access the underlying host:

However, the page displayed to the user can change around, so to fingerprint unauthenticated servers it would be best to make a GET request for an authentication protected endpoint such as ‘/api/contents?type=directory’ and check for a HTTP 200 response status. This endpoint in particular returns JSON that lists notebooks and folders in the Jupyter server root. But Censys scanners wouldn’t go about making such a request, so the other method used above that leverages the redirection behavior was necessary. I ran more Censys queries to get a better understanding of how many Jupyter Servers were out there. Based on the observable behavior of Jupyter Servers I came up with the following query:

services.http.response.html_title: "Jupyter Lab" or services.http.response.html_title: "JupyterHub" or services.http.response.html_title: "Jupyter Notebook" or services.http.response.html_title: "Jupyter Server" or
services.http.response.headers.unknown.name: "x-jupyterhub-version" or  (services.http.response.headers.location: `\/tree?` and services.http.response.status_code: 302) or (services.http.response.headers.location: `\/lab?` and services.http.response.status_code: 302) or (services.http.response.headers.content_security_policy: "report-uri" and services.http.response.headers.set_cookie: "_xsrf")

The query is based on the servers’ behavior when making GET requests for ‘/’. It looks for the redirections mentioned earlier, as well as the default titles for the login page, and a combination of the Content-Security-Policy header with the now deprecated report-uri directive and a cookie called “_xsrf” which are returned everytime the user makes the first request for the login page. The number of results was not much different from simply searching for Jupyter, however. Censys has a fingerprint of Jupyter Servers, and searching for it yields around 18 thousand hosts.

Authentication protected servers

But it seems that most Jupyter Servers out there require authentication, however. What to do about them? The only answer at the moment seems to be to try and brute-force the password. There have been CVEs registered for Jupyter, but none that would enable an attacker to target a server and access a terminal to the underlying host. A lot of them affect the client-side components of Jupyter Notebook, Jupyter Lab, or JupyterHub, and include things such as XSS and CSRF vulnerabilities. I did not look into those during this project however.

I was not able to find a vulnerability that would enable me to somehow execute code without authenticating either. However, I did notice that Jupyter reflected the URL path and Referer header on the console without sanitizing terminal escape sequences. This opens up a vector for attack somewhat, as there have been vulnerabilities reported in terminal emulators that could in theory allow for code execution, such as CVE-2021-27135 on xterm. But this is very far fetched and barely worth mentioning, I believe. The days of terminals supporting escape sequences that write to files or execute commands have been over for around 15 years already, and the servers out there are most likely running detached from a terminal.

It does allow for more “stealthy” brute-force attempts, however, in the cases where the target is running the server attached to a terminal. Placing the escape sequence ^[2K^[1A (a sequence that erases the current line followed by a sequence that goes up one line) in the Referer header or somewhere in the URL makes it so login attempts do not show on the screen. They will be written to the log still, and will probably be seen eventually unless the log is only read by outputting it to the terminal which is unlikely. The images below show this in action, only the final request which submits the password appears on the screen, as that line does not include the Referer, but one could include an escape sequence as a query parameter as well although it does not erase the IP address from the screen for some reason.

Its not much, I know. But maybe you never gave terminal escape sequences any thought and now you have! Next time you review source code for a script or tool you downloaded from the internet (which you do diligently every time) don’t just cat the file to terminal, read it on a file editor or use cat -v to display escape sequences.

Simple, but effective

While I was writing this article, Aqua Security wrote a blog post about an attack against a Jupyter Notebook honeypot they set up where the attacker tried to use a ransomware python script to encrypt files in a particular directory in the server. The attacker’s MO was really simple. He accessed the unprotected server and used the terminal feature to download some additional libraries required by his script. He then just copied and pasted the script, saved it to a file, and ran it. Seems like looking around for unprotected servers really is the state-of-the-art in Jupyter hacking currently. Not very exciting, but if it works then it works. Unfortunately, I was not able to push the frontier significantly :/.

Recommendations

Protecting your Jupyter server is really all about keeping authentication enabled, using the token generated by the server at startup or changing to password authentication and choosing a strong password. Using a firewall to whitelist external IP addresses that may access the server is also highly recommended. If the server is hosted at a cloud provider like AWS, setting up security rules through them is probably the most convenient way. Alternatively, one could host their server in an AWS SageMaker instance. In SageMaker, authentication to the server is handled by IAM, but a decision to host ML development infrastructure on SageMaker is a larger business decision and placing the server behind IAM is simply not enough alone to warrant that kind of change. If you are a hobbyist or student, it is an option worth considering.

Well, that’s it! Thank you for reading 🙂

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s