Not all IP addresses listed below are malicious hackers or spammers. I'm an aggressive admin that takes a dim view of bots other than the big three (Google, Bing/MSN, Yahoo) and social media sites like StumbleUpon, Digg and Facebook.
My list bans bots that may be content scrapers or researchers. If they're not human and not related to a major search engine, I don't care what they are. Rule of thumb--if my site gains nothing from the transaction and if the transaction isn't a human being on a browser, ban it as a waste of system resource and a potential liability. At the very least their leaching could reduce the response time of my site for a legitimate human user.
Some humans were observed engaged in non-browser activity, in some cases attempts at modifying or exploiting our site. If a human user isn't using a browser, but an automated script seeking to exploit the site, then they do not need to visit the site at all.
My many and varied bot-traps nail several IP addresses on a weekly basis, but sometimes just for fun and recreation, I scan for bots with my own eyes. My hands-on methodology of banning IP addresses is to examine my server's log file and note the obvious non-human activity. I identify bots based upon their actions and other criteria which I will not reveal here, but anyone who has looked at a raw server log should understand what I am talking about. It is not difficult to determine who is human and who is a bot, because bots are typically very stupid and, well, robot-like in their behavior, which should not come as a surprise. The authors of these bots are not terribly bright either, judging by what I have seen. When I have identified a single IP address, then that is not the end of the story. I want to know if it belongs to an entire neighborhood which is full of bad bots, because this is often the case in my experience, and why ban one single IP when one can ban 255 or more? It is more efficient to shut down bad IP ranges than play the game of one IP address at a time, which could result in a truly massive .htaccess in the end. I research Project Honeypot (I have been a registered user of that site for years and even authored a dark .css style for the web site) and related sites to determine the true nature of an IP address and the neighborhood in which it resides.
I take a dim view of Russia and China. For one thing, these states do not value freedom of speech and do not enshrine democratic principles. Also, our site is English, and I don't envision attracting any Russian or Chinese fans in this lifetime. Unlike some admins, I don't ban the entire region. But if I notice a large number of spammers in an IP address neighborhood in Russia or China, I don't hesitate banning the entire neighborhood rather than bother with singling out individual IP addresses. I'm also skeptical of Poland and the Ukraine based on the many bad bots I've witnessed from those nations. Among all nations, I give China short shrift, because of its poor reputation and poor compliance with standards. When you look up an IP address with WHOIS, Chinese accounts sometimes don't specify a small network neighborhood, but offer a range that may encompass millions of IP addresses rather than just hundreds or thousands. Where China is concerned, if I detect numerous bad bots in a general neighborhood, and China refuses to specify a proper narrow range in WHOIS, then I will not hesitate to ban the millions. With other countries, I take a more refined approach, depending upon my evaluation of the level of corruption in their government. I am reluctant to ban entire neighborhoods in democratic nations, but if an IP range is reserved exclusively for web hosting or what is termed "Direct Allocation," then that is a red flag.
I recommend the .htaccess security measures found on Perishable Press. The .htaccess code found on Perishable Press requires testing and adjustment for specific sites, because it is more complex than a mere Deny-from. However, Jeff's code is valuable and well-worth the effort of refining, because it adds another layer of protection in case attackers slip past the Deny-from barrier. I have integrated some of the intel from his excellent "5G Blacklist 2012" into my own .htaccess, but I do not include those bits here, because the credit for that is his, not mine, and he updates it and has the best version. I recommend visiting his site and merging his intel with mine to make a synthesis. An admin that does so may notice a reduction in leach and spam traffic. My .htaccess is a synthesis of my own work with that of Perishable Press. As for Wizcraft, which I used to recommend, I do not use his intel any longer, because he does not respond to emails, and some of his deny-froms have proven problematic, causing technical issues with Wordpress, for instance. I only use Perishable Press intel and my own. My code is in use on a Wordpress site right now.
Bear in mind that intel is perishable with the changing nature of the Internet, and I'm not sure what the shelf life is. I'll try and keep it updated periodically, but there are no guarantees. Some admins grant a reprieve to addresses on an annual basis. They worry that spammers will relocate, and the IP addresses may then become legit. I don't feel like that is a worthwhile concern for my purposes. I would rather continue blocking address ranges that have a lengthy history of hosting spammers and hackers. Let's say a spammer abandons or gets booted from an IP. The hosting company will rent the IP to somebody else, and chances are that sooner or later they will rent it to another spammer, because they were lax enough to rent to one spammer and are likely to make the same mistake again in pursuit of the almighty dollar. The only good IP addresses are those in use by end-users of an Internet Service Provider and major search engines such as Google. Anything else is questionable at best and likely to be used by spammers sooner or later.
You can pop this baby right into your .htaccess file if you admin an Apache server. Beware of Blogger's word wrap, though, which sometimes creates syntax errors in copied and pasted code. Note that the last Order directive in an .htaccess file will be the one used. If you already have a Limit paragraph in your .htaccess, add the Deny's to it, but use only one Limit container.
4 comments:
Hi igor.. thanks for code, im using it on my wapsite as well :-)
Well, it's a bit dated, but should cut out a lot of the static from the Internet. The most severe restriction is actually for UserAgent ^$ and ^-$ since many spammer scum do not identify any UserAgent at all for fear of its being blacklisted. However there have been some rare issues with human users of arcane systems, like in an academic setting, using something other than a normal Internet browser, getting booted. I don't worry about that sort of thing but if you have university professors experimenting on your site then you probably should accept all callers anyway.
Good list Igor
Quite a few in here I don't already have
Good riddance to the bad bots and IPs
I continue to derive pleasure from blocking spambots and attack-bots, although it is a time-consuming investigative process on my part.
I believe that the use of this .htaccess can reduce bandwidth consumption by a considerable amount and that all web sites should use it along with the intel from other sources noted above, if they are at all interested in saving money on their bandwidth quota and providing a more responsive interface to their human users.
Post a Comment