By Meghan Good
The concept of cross-site scripting (XSS) is nothing new to the security community. XSS is the ability to arbitrarily inject code disguised as user input that gets executed in an unexpected and unpredictable fashion. There are three types of XSS: reflected, stored, and DOM XSS.1 Many are already well aware of how to exploit and understand the first two types of XSS. However, DOM XSS is not as thoroughly understood. This post will hopefully educate those that do not know much about DOM XSS and will also show that, as internet browsers continue to develop, performing DOM XSS gets increasingly difficult; however, there still exists attack vectors like outdated software and user error.
DOM XSS is specifically when a malicious payload is executed through the modification of the DOM, causing the client-side code to be altered from what would have been expected input to whatever the attacker chooses. If done with techniques such as URL fragmentation, this can go virtually undetected, so neither the server nor the client can tell this is happening. All an attacker would have to do is send the target a URL that has the malicious payload, and upon clicking the link, the target will then be taken to a page with the injection that is dynamically generated.
Browsers, in response to security standards as well as the threat of XSS in general, are constantly updating and improving security. To counteract XSS, most modern browsers make use of URL encoding to help mitigate malicious code in URLs. URL encoding, or percent-encoding, is when the browser sees a URL with special characters that are reserved and, instead of keeping them in plaintext, the browser encodes those characters to prevent or “escape” malicious text from being executed from the URL. The percent-encoding scheme converts the special character to their ASCII value (or UTF-8 value if non-ASCII) and then represents that byte value in hexadecimal preceded by a percent sign. For example, if the “<script>” tag was passed in as part of a URL, the browser would automatically encode the “<” and “>” to their corresponding values of “%3C” and “%3E”, resulting in “%3Cscript%3E” in the URL instead of “<script>”.
To someone trying to perform DOM XSS, encoding does tend to prevent any attack from happening unless the user tries to beat the percent-encoding by taking advantage of the many encoding problems that different browsers have. This is why most examples of DOM XSS in articles about the subject do not work anymore because browsers are constantly updating and improving, which makes breaking percent-encoding harder and harder to achieve. However, if the user has no knowledge of such idiosyncrasies in their respective browser of choice, there are still two possible attack vectors for DOM XSS: older browser versions and programmer error.
Individuals who keep security in mind may not initially think that people would be running extremely outdated browsers, but that is a very common occurrence. In fact, people are using browsers so old that percent-encoding is not yet used. This means that any special characters are not escaped with the process explained above. This leaves the user vulnerable to DOM XSS, especially so if the developers on the website happen to not sanitize input before using that input in code. Simple code examples made available here: https://github.com/mgood15/auth_topic_DOM_XSS/tree/master/failures would not work on a modern browser, but would work on older derivatives of common browsers (such as Firefox 26). In the code, (also shown in Figure 1.0), there is no data sanitization. User input from the URL is just immediately processed and written to the web page.
Figure 1.0 – Above is the code of an unsuccessful example in a modern browser that performs percent-encoding correctly. The page does not “unescape” the user input and will therefore be dealing with percent-encoded input, which will not include special characters necessary for a simple “<script>” injection.
The other attack vector where DOM XSS still applies is programmer error. Developers for websites cannot avoid making mistakes. There are several common mistakes a developer could do that would leave a user vulnerable to DOM XSS. The most common mistake is decoding the encoded URL to use as input for DOM manipulation without any sanitization. The programmer is taking the encoded URL, which has escaped any potentially malicious characters, and is using that plaintext input in code without prior filtering to ensure the integrity of the input. This, despite seeming like a big security issue, happens often in the real world, and is easy to spot and take advantage of in multiple ways. Such examples are discussed below and also are made available (with additional examples) here: https://github.com/mgood15/auth_topic_DOM_XSS/tree/master/successes.
In a hypothetical situation, a developer sets up a page where a user can login by typing in their username (the password field is out of scope for this article). This username is then taken in and passed to the landing page for that user with a message of “Hello, <username>!”. The developer decodes the username input passed in straight from the URL and, in the code shown below, uses the the same input unfiltered to write the greeting message to a user using the document.write() function. This creates a dynamically generated greeting message, but also presents a huge opportunity for DOM XSS. Since the input is unsanitized, a malicious user can enter anything into the field, knowing that it will be decoded and then executed when passed to the DOM with document.write().
Figure 2.0 shows the input of an attacker for this hypothetical scenario, and Figure 3.0 shows the outcome of the request as processed by the code shown in Figure 4.0. When the script tags are passed in the “username” field, that input is seen as valid code and is then executed by the browser, resulting in the alert pop-up.
Figure 2.0 – Above shows user input for the landing page of the application. This input includes script tags with an alert of “This page is vulnerable to DOM XSS.”
Figure 3.0 – Above shows the result of the input from Figure 2.0. One can see the alert from the input be executed as code due to the script tags. In the URL bar, one can also see the unescaped code that was taken in upon receival of input.
Figure 4.0 – Above shows the code that processed the page in Figure 3.0. The username is taken in from the landing page and is extracted through the document.URL.substring function. The username is then unescaped, which decodes the percent-encoding the browser performed, and the “realName” variable is immediately used in document.write() to render on the page with no prior filtering.
An attacker could also decide to pass HTML into document.write() and dynamically render their own HTML on the page. Figure 5.0 below shows an anchor tag with an href pointing to a Google search. In a real scenario, this link could be a malicious site of the attacker’s choosing instead of an innocent Google search of puppies. Figure 6.0 shows the result of this request, a link on the page prompting the user to click. When clicked, the user is redirected to the website referenced in the href anchor, which can be seen in the bottom left corner of Figure 6.0.
Figure 5.0 – Above shows the landing page where the user has input an href anchor which points to a Google search of the word “puppy”. This HTML anchor will be passed in, decoded, and then processed by document.write(). This HTML will then be rendered to the page due to the user being able to arbitrarily pass in anchor tags, and a hyperlink will display, shown in Figure 6.0.
Figure 6.0 – Above shows the result of the previous input from Figure 5.0. A hyperlink was dynamically generated on the page and, when the user hovers over the link, it shows a URL preview that points to a Google search for the word “puppy”.
To make use of the scenario explained above, an attacker can generate this URL and send it to others. When they click the link, they will be directed to a legitimate page that the user is expecting but with the addition of the malicious link made available through DOM XSS. An attacker can even obfuscate the URL to make the link look less like an obvious exploit, which may make the attack harder to detect.
DOM XSS, when leveraged properly with the correct conditions, can be dangerous to users. The constant updates of browsers do tend to mitigate most DOM XSS attempts with percent-encoding, but there is still the issue of the fact that a perfect world does not exist where everyone runs the most recent version of their browsers and where developers always sanitize data. Because these attack vectors continue to exist, and may always exist, DOM XSS will continue to be an important problem that needs to be addressed.