The Most Active and Friendliest
Affiliate Marketing Community Online!

“Adavice”/  “CPA

Master List of Known Bots

There is a downside to this:

What if there in some new start-up that is a webmaster friendly search indexing bot?
I would hate to deny the next viable challenger to elGooG ...
 
There is a downside to this:

What if there in some new start-up that is a webmaster friendly search indexing bot?
I would hate to deny the next viable challenger to elGooG ...

You'd have to just deny specific pages to that bot or most bots... but you could still serve up the content that you wanted index... say the public content. The only reason to block bots is to block them from scraping content that your site generates or uses, like private user profiles, etc.
 
Told Ya;
This code kiddie was too dumb to forge the header HA +1
PhantomJS had he forged the header right, I might have not noticed.
PhantomJS/2.0.1-development Safari/538.1
Code:
2  67.217.35.167 - - [31/Dec/2018:15:58:32 +0000] "GET /img/dog-affiliate_700.jpg HTTP/1.1" 200 166691 "https://domain.com/" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1"  

3  67.217.35.167 - - [31/Dec/2018:15:58:32 +0000] "GET /js/wyd.js HTTP/1.1" 200 24 "https://domain.com/" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1"  

4  67.217.35.167 - - [31/Dec/2018:15:58:32 +0000] "GET / HTTP/1.1" 200 444 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1"

barry@DS10:~$ ./ipinfo.sh
Pls enter your ip:                                                                                                          
67.217.35.167                                                                                                  {  "ip": "67.217.35.167", 
"city": "", 
"region": "",
"country": "US", 
"loc": "37.7510,-97.8220", 
"org": "AS22458 NetSource Communications, Inc."}
barry@DS10:~$

Another data center eliminated ...
 
Just recently found on one of my sites, some DMCA scanning bots.

Identifiable mainly by their user agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/71.0.3578...

The other one is:
Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Firefox/38.0

And last but not least:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100721 Firefox/3.6.8

The funniest one were the ones that specified HeadlessChrome, like that's some sort of legitimate browser.

I think it's safe to say any version of FF lower than 50 or chrome lower than 60 probably has't updated cause they're a bot. Interesting note, Lower versions of firefox like 38, are embedded browsers based on XULRunner, which is now obsolete/discontinued according to mozilla...and licensing they do with companies who own programming languages like Oracle are shaky at best, so people embedding xul are going to be way behind in browser version. That version 3.6.8, lol, thats cause they are using GeckoFX capabilities which were discontinued even before FF version 15 I think.

I'm not going to specify the copyright watchdog companies that are using these, it's bad for business considering SEO on this forum is pretty decent ;) but heads up, some more trash to add to the list.

If it's not the black hats, it's the white knights, but all of their bots are not welcome in me site, yar.
 
"^^I think it's safe to say any version of FF lower than 50 or chrome lower than 60 probably has't updated cause they're a bot. Interesting note,"
ya think so :D

Code:
    ~*Chrome/([1-40]\.[0-9])    1;
    ~*Opera/([1-29]\.30\.[0-9])    1;
    ~*=Mozilla    1;
    ~*Mozilla/([4]\.[0-9])    1;
    ~*Firefox/x\.x    1;
    ~*Firefox/([1-5][0-9].*)    1;

That's what FF 3.x is about 'dumb-runner'? Lot of out of date c0d3k1dd13s

Google is still using chrome 49 to check for older mobile (I see occasionally) the IP is a google IP (I checked)
 
Last edited:
Actually
Chrome/41.0.2272.96
==============
Code:
     1  66.249.66.28 - - [06/Jan/2019:02:13:20 -0500] "GET / HTTP/1.1" 200 6016 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     2  66.249.66.29 - - [05/Jan/2019:22:31:54 -0500] "GET / HTTP/1.1" 200 6010 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     3  66.249.66.30 - - [05/Jan/2019:17:55:22 -0500] "GET / HTTP/1.1" 200 6046 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     4  66.249.66.29 - - [05/Jan/2019:17:04:59 -0500] "GET /x.html HTTP/1.1" 200 2227 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     5  66.249.66.28 - - [05/Jan/2019:09:22:19 -0500] "GET /? HTTP/1.1" 200 5318 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     6  66.249.66.28 - - [05/Jan/2019:09:02:39 -0500] "GET / HTTP/1.1" 200 5304 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     7  66.249.64.74 - - [05/Jan/2019:08:12:22 -0500] "GET /x.html HTTP/1.1" 200 2213 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     8  66.249.64.70 - - [05/Jan/2019:07:52:29 -0500] "GET / HTTP/1.1" 200 5293 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
     9  66.249.64.74 - - [04/Jan/2019:23:50:46 -0500] "GET /?x HTTP/1.1" 200 5318 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    10  66.249.64.72 - - [04/Jan/2019:23:30:32 -0500] "GET /x.html HTTP/1.1" 200 2233 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    11  66.249.64.70 - - [04/Jan/2019:22:29:30 -0500] "GET / HTTP/1.1" 200 5309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    12  66.249.64.74 - - [04/Jan/2019:14:57:46 -0500] "GET /x.html HTTP/1.1" 200 2212 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
 
Apache2? .htaccess
If you have a VPS or dedicated (with root) ...
In Nginx you can deny IPs or IP CDIR x.x.xx.0/16
example block a whole network;

Code:
$ ipcalc 90.80.50.0/16
Address:   90.80.50.0           01011010.01010000. 00110010.00000000
Netmask:   255.255.0.0 = 16     11111111.11111111. 00000000.00000000
Wildcard:  0.0.255.255          00000000.00000000. 11111111.11111111
=>
Network:   90.80.0.0/16         01011010.01010000. 00000000.00000000
HostMin:   90.80.0.1            01011010.01010000. 00000000.00000001
HostMax:   90.80.255.254        01011010.01010000. 11111111.11111110
Broadcast: 90.80.255.255        01011010.01010000. 11111111.11111111
Hosts/Net: 65534                 Class A

Then there are firewalls iptables ufw ... that IP range gets no response -- lights out --{stealth}


ouch don't :p

% Information related to '90.80.0.0/16AS3215'

route: 90.80.0.0/16
descr: France Telecom SCE
descr: FT-SCE
origin: AS3215
remarks: -------------------------------------------
remarks: For Hacking, Spamming or Security problems
remarks: send mail ONLY to abuse -at --orange business.com
remarks: -------------------------------------------
mnt-by: RAIN-TRANSPAC
org: ORG-OBS3-RIPE
created: 2007-07-05T14:41:55Z
last-modified: 2010-11-10T17:03:45Z
source: RIPE
 
Last edited:
not really -- my ban lists are proprietary -- and updated at my leisure --but manually :(
There is no realistic and reliable program logic to this --too many variables. HTTP/1.0 is obvious. Many legit browser cannot accept HTTP/2.0 yet. Server farms (data centers) are obvious.
I can curl past Cloudflare with one of my "residential IPs"
Most of this is intuitive -- that has always been the problem.

Free Proxy / VPN / TOR / Bad IP Detection Service via API and Web Interface | IP Intelligence this is interesting -- useful
 
not really -- my ban lists are proprietary -- and updated at my leisure --but manually :(
There is no realistic and reliable program logic to this --too many variables. HTTP/1.0 is obvious. Many legit browser cannot accept HTTP/2.0 yet. Server farms (data centers) are obvious.
I can curl past Cloudflare with one of my "residential IPs"
Most of this is intuitive -- that has always been the problem.

Free Proxy / VPN / TOR / Bad IP Detection Service via API and Web Interface | IP Intelligence this is interesting -- useful

Yeah for sure it would remain proprietary. Your hard work would be protected. A WP plugin like this would be paid as well, not giving away your (manual, Ouch!) work, just providing access to an API endpoint. You wouldn't even have to colab with me. I know there are plenty of WP sites that would pay for something like that to help protect their own data. Anyway, just an idea I had while reading through this thread ;) By all means take it or leave it.
 
That's an interesting idea -- however -- if you need to validate the user in an api -- you would have to limit the requests per month. SaaS tier pricing.
Otherwise, the plugin will be used on more than one blog if it's sold on a one time and unlimited use basis.

I use one or more of 4 databases (on my servers locally -- synchronized )
ip_block
asn_block
tor_block
geo_block

1-4 on a domain depending on the domain's level of security
only tor_block is a cron update (30 min)

The other thing is that checking over a million IPs maybe 30K rows/lines takes time, Locally maybe +-50ms -- remote? IDK >400ms ?
It would slow down the initial page load but could set a browser cookie but only for that website. So each subsequent page would not make the request.

That link Free Proxy / VPN / TOR / Bad IP Detection Service via API and Web Interface | IP Intelligence is pretty good at detection and has an api -- he might be the right guy to talk with -- he is actively maintaining his database and getting a lot of user input with his api <<< I use it to screen as a shell script --but after the fact from my access log.

Code:
#!/bin/bash
#ipintel.sh

echo "Pls enter your ip:"
read ip
#whois -h whois.cymru.com "$ip"
 #curl "https://ipinfo.io/$ip"
curl "http://check.getipintel.net/check.php?ip=$ip&contact=your@valid-email.com&flags=b"
echo =a[1] is real bad!
 
MI
Back