A Look at the AD Fraud Ecosystem

By Nick Schmidt

Overview

Today’s internet is powered by ads. They serve as the monetary powerhouse behind major websites and online content. Ads are a powerful tool for the content host, the content providers, and the viewer. A site like Youtube is a good example of this ecosystem. The content hosts, Google, make their money by selling ad space to marketing agencies. The content creators, Youtubers, get a share of the ad revenue for drawing viewers to the site. And viewers, in exchange for being able to enjoy and use a site free of cost, occasionally watch 15-30 second ads before a video begins.

Google’s ad network, Adsense, is a popular method by which content providers can fund their site. This is an easy to apply for system by which a content creator can incorporate ads and make a small amount of revenue from Google. There are a few ways to be paid by the ads on a site, Cost per click, cost per thousand impressions, and cost per engagement. Cost per click is a rate determined for when the visitor clicks on the advertisement. Cost per thousand impressions is based on views instead of clicks. Due to the fact that clicks can be rare, ad providers will pay simply for exposure. The final and growing bid type is cost per engagement. Here the visitor has to undergo some action such as watching a video to fulfil the requirement and net the content provider money. Actual revenue is determined by a large multitude of factors, most of them beyond the content creators control. What is controlled by the creator is the type of content hosted on the site. On average generic content such as forums or social media will pay very low. Past this, content heavy sites like blogs or news will garner more ad revenue. At the top of the chain is commercial sites focused on selling the visitor a product. This can be simply a business webpage or more much more user interactive such as an online webstore.

My goal for this project was to register a website with adsense and see if I could get any sense of the metrics associated with ad revenue. Additionally, from a security perspective, I wanted to see how feasible it would be to create a bot that would click on my site’s ads to generate revenue. I wanted to find out how quickly Google could detect the bot, and what information it used to make that determination. Unfortunately at the time of writing it has been a few weeks since my submission to join the adsense program. After a few iterations of a site to fit Google’s standards, it seems the website has finally been accepted for review. However the review process (even though they state that it will only take up to three days…) will take longer than the due date for the blog. Instead I will review some mechanisms that Google is known to use for bot detection as well as a sample I’ve created in Selenium.

 

Detection

Google doesn’t publish exactly how they go about filtering fraud, their click protection information page consists of buzzwords such as ‘automated algorithms’ and ‘traffic analysis’. However with some research including other ad agencies and reading between the lines we can make some educated guesses as to how the large majority of fraudulent clicks are detected.

 

Clicks

The simplest and fastest way to detect clicker bots is to actually track the clicks on the page. The main idea behind this strategy is that bots and humans will navigate a page in different ways. If you could identify ways in which authentic users are using a website then it should be easy to filter out fake traffic. (1)

1.png

http://adage.com/article/digital/inside-google-s-secret-war-ad-fraud/298652/

 

The image in the center shows clicks by a bot that concentrate on the edges of a page, where ads are most likely found. This, contrasting with what normal click density should look like, would be easily discovered by Google’s analytics.

 

Traffic Origin

A huge bottleneck for clicker bots (and what would have probably caught me) is looking at the source of ad traffic. Large amounts of ad traffic from the same address is a huge red flag. The  next step to try to evade this detection would be to enroll in a proxy or VPN service. However, Google in addition to all other ad providers, try to block proxies as well as VPNs. (2) I would also guess that TOR is probably included in this ruleset. TOR exit nodes are easily found and a dynamic rule to block all traffic originating from TOR would be the easiest to block. (3) Traffic analysis prevents a single host from causing any significant amount of fraud to ad providers. The only way to truly bypass this protection would be to have a vast number of IP addresses (think botnet). One thing I was curious about and did some reading on was how this protection translated into the IPv6 address space. As IPv6 blocks are fairly easy to come by, purchasing huge swaths of addresses for ad fraud would bypass this protection completely. Unfortunately ad providers solved this by as a general rule blocking the entire IPv6 address space. It’s a boring way to solve the problem, but effective.

 

Click Through Rate (Heuristics)

Another avenue by which Google ad fraud is if a sites metrics are outside the normal ranges. For instance, if a small blog has a 50%+ click through rate (people who click on the ads) it is a clear indicator of foul play. Other collected information that is most likely hidden to the adsense user is likely collected and analyzed as well to check if a site is intended for large amounts of ad fraud. Hiding fake clicks in a sea of fake traffic would most be an easy way to circumvent this, however, solutions one and two help stifle this. In this case how they take action against the fraudulent site could range between chargebacks or demonetization rather than refunding the ad provider for a single detected click.

 

Selenium Example

What I wanted to test was if it would be possible to create a website and automate some mechanism to click on my own ads. Selenium is a programmatic way to control a browser. I chose to use Python 2.7 for this example as well as the Chromium driver. (4) Other supported drivers can be found on the documentation website. There are differences between them however I found that chrome was easiest to test with. The following script is a simple proof of concept what will find an element on the page, mouse over the element, and click on it.

from selenium import webdriver

from selenium.webdriver.common.action_chains import ActionChains

import time

 

driver = webdriver.Chrome()

driver.get(“http://www.reddit.com/”)

time.sleep(5)

 

element_to_hover_over = driver.find_element_by_id(“header-img”)

print element_to_hover_over

hover = ActionChains(driver).move_to_element(element_to_hover_over)

hover.click().perform()

time.sleep(5)

driver.close()

 

 

 

The first line spawns an instance of Chromium. This will open up a full chrome browser that you can watch the rest of the actions happen real time. Adsense was not approved yet on my personal site so I am using Reddit as an example. Line two navigates to Reddit and then sleeps for five seconds to let the page fully load. The next few lines identifies an element on the page by an HTML id tag, move a cursor to that element, and perform a click action. Here the element we are clicking on is the icon in the top left corner of the main page. The next sleep is so that we can see the page load again, letting us know that the click was successful, and then the browser closes.

 

 

from selenium import webdriver

from selenium.webdriver.common.action_chains import ActionChains

import time

 

url = “mysite.com”

driver = webdriver.Chrome()

 

while True:

try:

driver.get(url)

time.sleep(5)

element_to_hover_over = driver.find_element_by_id(“ad-location”)

hover = ActionChains(driver).move_to_element(element_to_hover_over)

hover.click().perform()

time.sleep(2)

catch Exception as e:

print e

break

 

print “closing down”

time.sleep(5)

driver.close()

 

 

 

 

This next example is a similar bot in that it will click on the ad location, but it will also continually re-request the base site to click on the ad again. Assuming 10 seconds per loop (loading the page and sleeps) that is about 8,640 page hits a day. If we went a step further and threaded this process we could easily hit tens of thousands.

 

Some considerations to take into account with this approach. First is that we are looking at a 100% clickthrough rate in this example. This would most likely be caught within the first few minutes of operation, average clickthrough rate of ads are about 1%. Having a distinctly higher rate especially on a small site would be a huge red flag. Additionally, we have no method to vary the IP address we are originating from. Open proxies are an option but they might be blocked. Something more interesting could be to try to tunnel through AWS and have our personal proxy try to change its address more often. However still with the amount of traffic that we could achieve, we would most likely need a much larger address pool.

 

One thing that isn’t visible from this example is how a web automation bot might be detected when it accesses the page via unique characteristics associated with the software. A company called Distil Networks specializes in detection of automation systems. According to Jake (an employee I was able to talk to via their website) the product that is placed on the site looks for fingerprints associated with Selenium. (5) From our brief talk, he hinted at some javascript identifiers that are present in Selenium that give away bots. He also mentioned that unless the site was running Distil’s or another form of protection against automation tools such as Selenium, click fraud would most likely go undetected. The ad providers themselves do not detect these attack vectors.

Conclusion

 

After looking at the ways by which Google and various ad providers detect ad fraud as well as some of the techniques associated with trying to circumvent these mechanisms, I think the end result is as expected: Ad fraud is definitely possible, but it isn’t feasible to really accomplish on a small scale. To successfully generate any amount of ad revenue you would need many different sides to use for fraudulent clicks as well as a large amount of IP addresses to proxy traffic through. The actual clicking portion is fairly simple to automate, however, the example we used here would only work on dedicated machines. Botnets like Methbot, who’s sole purpose is to use infected users to click ads, would need a smaller and more lightweight method to click on ads. Installing Python, Selenium, and drivers wouldn’t be a feasible solution. So unfortunately, unless you want to be involved in some good ol’ organized cybercrime, Google Adsense won’t be funding your private site anytime soon.

 

(1)http://adage.com/article/digital/inside-google-s-secret-war-ad-fraud/298652/

(2)http://blog.clickcease.com/vpns-proxies-and-click-fraud

(3)https://check.torproject.org/exit-addresses

(4) http://selenium-python.readthedocs.io/installation.html#drivers

(5) https://www.distilnetworks.com/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s