Showing posts with label .htaccess. Show all posts
Showing posts with label .htaccess. Show all posts
Saturday, January 24, 2015
Sux to be a Spammer Day
Today is Sux to be a Spammer Day at techlorebyigor. I've updated my IP blacklist with all my latest catches from the sea of spambots.
Monday, June 10, 2013
Wordpress Security Vs. Wordpress Search Ranking
Wordpress security is sometimes at loggerheads with a site's search ranking. There are many tricks and tips recommended by security wonks that will actually decrease a site search ranking, such as banning all hits to xmlrpc.php, or disallowing various paths in robots.txt. I've experimented over the last several days and learned what works and what is counter-productive. I do not believe it is wise to ban hits to xmlrpc.php, and I do not think web admins should second-guess Google when it comes to directing robots. Google knows what it is doing, for the most part, and additional rules make Google angry, in a manner of speaking. I watched my site plummet from #2 in search rankings for a particular term to #5 after adding a lot of rules to robots.txt. Needless to say, I yanked those rules right out!
There is such a thing as having enough or even too much security. With regular backups of the database and the files, I am not inclined to follow all of the recommendations set forth by Perishable Press, one of the few sites I regularly follow. I view Perishable's advice in the way of guidelines and educational material. The author has a knack for explaining technical issues without resorting to jargon, with a humorous style reminiscent of Stephen King--the American vernacular, gotta love it--and he offers excellent examples on .htaccess. He is my "go-to" site when I am confused about arcane .htaccess syntax, which is often, because .htaccess syntax is unintuitive. I use some of his security tips, but not all, because some cause problems. I am also concerned that perhaps other problems may be created that I cannot detect, problems that may become evident in the future after I add a new plug-in or there's a new update to Wordpress.
Perishable's .htaccess code is sometimes compressed in a way that makes it difficult to debug or understand what is being done. Perhaps that is a form of showing off or maybe the intention is for the code to execute faster, but I'd prefer to sacrifice efficiency for readability and ease of maintenance.
I am no stranger to compressing code. I won a little contest back in the '80s, getting my name and program published in a national magazine. The challenge was to code a BASIC program that did something cool in only one line. Each BASIC statement could be separated by a colon (:), and GOTO 0 was allowed. But was this a useful or helpful skill? Maybe. This sort of experience may have helped me become a better maintainer of other people's spaghetti-code programs, which comprised a large portion of my career. I rarely had difficulty finding and fixing bugs.
I think Apache wrote the language for .htaccess back when every byte mattered, and in order to save a couple bytes, they made the language cryptic and anti-human. I much prefer languages such as COBOL, batch/script, or BASIC for their sheer readability. I never was a fan of C++, even if it is twice as fast. In my opinion, buy a faster computer, if you need speed. When programming languages are easier to understand and to code, then greater deeds may be wrought by human minds and with far fewer bugs. That's my philosophy about programming. I have indeed worked with extremely cryptic computer programming languages--assembler, no less. I am merely stating my own preference as a programmer and user. It's nice to be able to look at source code and figure out what is going on in just a few moments. Maybe my opinion does not dovetail with job security for those programmers already entrenched in cryptic languages, but it seems rather obvious to me.
There is such a thing as having enough or even too much security. With regular backups of the database and the files, I am not inclined to follow all of the recommendations set forth by Perishable Press, one of the few sites I regularly follow. I view Perishable's advice in the way of guidelines and educational material. The author has a knack for explaining technical issues without resorting to jargon, with a humorous style reminiscent of Stephen King--the American vernacular, gotta love it--and he offers excellent examples on .htaccess. He is my "go-to" site when I am confused about arcane .htaccess syntax, which is often, because .htaccess syntax is unintuitive. I use some of his security tips, but not all, because some cause problems. I am also concerned that perhaps other problems may be created that I cannot detect, problems that may become evident in the future after I add a new plug-in or there's a new update to Wordpress.
Perishable's .htaccess code is sometimes compressed in a way that makes it difficult to debug or understand what is being done. Perhaps that is a form of showing off or maybe the intention is for the code to execute faster, but I'd prefer to sacrifice efficiency for readability and ease of maintenance.
I am no stranger to compressing code. I won a little contest back in the '80s, getting my name and program published in a national magazine. The challenge was to code a BASIC program that did something cool in only one line. Each BASIC statement could be separated by a colon (:), and GOTO 0 was allowed. But was this a useful or helpful skill? Maybe. This sort of experience may have helped me become a better maintainer of other people's spaghetti-code programs, which comprised a large portion of my career. I rarely had difficulty finding and fixing bugs.
I think Apache wrote the language for .htaccess back when every byte mattered, and in order to save a couple bytes, they made the language cryptic and anti-human. I much prefer languages such as COBOL, batch/script, or BASIC for their sheer readability. I never was a fan of C++, even if it is twice as fast. In my opinion, buy a faster computer, if you need speed. When programming languages are easier to understand and to code, then greater deeds may be wrought by human minds and with far fewer bugs. That's my philosophy about programming. I have indeed worked with extremely cryptic computer programming languages--assembler, no less. I am merely stating my own preference as a programmer and user. It's nice to be able to look at source code and figure out what is going on in just a few moments. Maybe my opinion does not dovetail with job security for those programmers already entrenched in cryptic languages, but it seems rather obvious to me.
Saturday, June 1, 2013
Blexbot Content Scraper is Really Nielsen Media Research
I had great difficulty finding detailed information online about an IP address, 216.176.177.162, that appeared in my site log over ten thousand times. But now that IP address is cold busted. It belongs to Nielsen Media Research, a pack of content scrapers. They do not wish to be identified as such, and so they lie, and call themselves a random name like Blexbot. Tomorrow they will be clexbot, and the day after that, wmu-bot. What are Content Scrapers? They are greedy bots that attempt to grab every piece of data from a given site. Interesting bits of this data are then grouped together and sold to companies, governments, or individuals. In short, they grab content and try to profit from it. They do not send traffic. They should be banned by every site, no question about it.
Lookie what the scumbags are doing on a Wordpress site:
Lookie what the scumbags are doing on a Wordpress site:
216.176.177.162 - - [29/May/2013:06:21:13 -0800] "GET /password HTTP/1.1" 404 2438 "-" "BLEXBot"They're not just content scrapers, they're malicious hackers. Those 404's you see above? That code means they're making up links as they go along, running them up the flag pole to see if anybody salutes. Meanwhile, the web admin gets to have fun wondering what's wrong with his web site that all of these 404 errors are popping up. (There were many more than just the above examples.)
216.176.177.162 - - [29/May/2013:06:21:16 -0800] "GET /signup?context=webintent HTTP/1.1" 404 2438 "-" "BLEXBot"
216.176.177.162 - - [29/May/2013:06:21:18 -0800] "GET /reg/join HTTP/1.1" 404 2401 "-" "BLEXBot"
216.176.177.162 - - [29/May/2013:06:21:21 -0800] "GET /forgot_password HTTP/1.1" 404 2438 "-" "BLEXBot"
Bot-Net Attack? What Bot-Net Attack?
I read many articles today about the brute force attack targeting Wordpress sites. My site is secure, and I just laugh at the enormous waste of that stupid bot-net's bandwidth. Each hit taxes my site about 500 bytes, so those scumbags will have to hit my site 2,000,000 times in order to waste one of my gigabytes--but that calculation seems rather liberal to me. After all, my deflate instruction is near the top of my .htaccess file, so I would wager that instead of 500 bytes, the server actually transmits each bot closer to 300 bytes, maybe lower since old 403.html is, after all, mere text, which receives quite optimal compression rates from any compression algorithm worth its salt.
But igor's solution will never be the thing people click on in google. Packaging and appearance are the thing. That is all right, because it is enough for me that my client's site is perfectly impregnable. I want his site to be fast all the time, I want it to look right all the time, and I want black hat hackers and evil bots to fail in everything they attempt.
Upon reflection, I think the stupid brute force attack against wp-login is meant to promote the sales of some cybersecurity firm(s). Let us be clear, it is not a serious attack. It is a stupid and ineffectual waste of bandwidth. Some cunning CEO may have decided to hire a bot-net to launch a stupid, ineffectual attack against everybody, knowing that the ignorant and the easily frightened would shell out money to buy a quick fix, a little band-aid to put on their precious web site to lull them into a false sense of security. I just don't which company(ies) are behind the attack, which stand to gain. There are probably a thousand different suspects.
But igor's solution will never be the thing people click on in google. Packaging and appearance are the thing. That is all right, because it is enough for me that my client's site is perfectly impregnable. I want his site to be fast all the time, I want it to look right all the time, and I want black hat hackers and evil bots to fail in everything they attempt.
Upon reflection, I think the stupid brute force attack against wp-login is meant to promote the sales of some cybersecurity firm(s). Let us be clear, it is not a serious attack. It is a stupid and ineffectual waste of bandwidth. Some cunning CEO may have decided to hire a bot-net to launch a stupid, ineffectual attack against everybody, knowing that the ignorant and the easily frightened would shell out money to buy a quick fix, a little band-aid to put on their precious web site to lull them into a false sense of security. I just don't which company(ies) are behind the attack, which stand to gain. There are probably a thousand different suspects.
Wednesday, May 29, 2013
Attack-bots Hitting Wp-Login on Wordpress Sites
I've noticed in my log recently that thousands of bots have been hitting wp-login.php repeatedly, despite being served 403 pages. I am not sure of the motivation of the attackers. However, thousands of hits on a .php file certainly can be a drain on system resources.
I developed a simple method of reducing the impact of wp-login attackers. After my deny-froms, I placed the following code in my .htaccess file. It is useful for Wordpress sites that do not permit users other than the administrator to log in, and where the admin uses a static IP address, which is an ideal scenario for security purposes. I should note that wp-login is specifically disallowed in my robots.txt and that there is no link to it on the Wordpress site in question. Thus, my code will not ensnare rule-abiding bots such as Google's.
My code is not applicable to all Wordpress sites. Some WP sites let users register and log in. I opted not to go that route, because our site is such a small one that I doubt anyone would remember their password. Our users can leave a comment by logging into a popular social media site.
The first conditional statement checks the IP address. If it does not match (indicated by the exclamation mark), then if the user is requesting the wp-login, wp-admin, or install page, that user is redirected to the 403 page. All of this happens without engaging the database or invoking any php code, so it is fast and efficient and minimizes the toll of the attack bots on system resources. I have banned the IP addresses of the vast majority of these attackers, but I notice a certain percentage do slip through with novel IP addresses, so this is a way of preventing them from forcing the server to load and interpret wp-login.php.
My 403 page consists of a mere 500-odd bytes with links intended to tempt bots to visit various spam-bot hells around the Internet, where they may encounter honeypots, investigators, bogus email addresses, bogus links, and in general waste a lot of their time and effort and generate no data of any use at all to them.
I developed a simple method of reducing the impact of wp-login attackers. After my deny-froms, I placed the following code in my .htaccess file. It is useful for Wordpress sites that do not permit users other than the administrator to log in, and where the admin uses a static IP address, which is an ideal scenario for security purposes. I should note that wp-login is specifically disallowed in my robots.txt and that there is no link to it on the Wordpress site in question. Thus, my code will not ensnare rule-abiding bots such as Google's.
My code is not applicable to all Wordpress sites. Some WP sites let users register and log in. I opted not to go that route, because our site is such a small one that I doubt anyone would remember their password. Our users can leave a comment by logging into a popular social media site.
#Block WP attackersPlace any static IP addresses that admins use in the above code (where www.xxx.yyy.zzz is). The code should exclude the IP addresses of legitimate users--admins--who log-in to the site. One could exclude multiple IP addresses by adding more conditional lines.
RewriteEngine on
RewriteBase /
RewriteCond %(REMOTE_ADDR) !^www\.xxx\.yyy.\zzz
RewriteCond %{REQUEST_URI} ^/wp-login [NC,OR]
RewriteCond %{REQUEST_URI} ^/wp-admin [NC,OR]
RewriteCond %{REQUEST_URI} ^/install.php [NC]
RewriteRule .* - [F,L]
The first conditional statement checks the IP address. If it does not match (indicated by the exclamation mark), then if the user is requesting the wp-login, wp-admin, or install page, that user is redirected to the 403 page. All of this happens without engaging the database or invoking any php code, so it is fast and efficient and minimizes the toll of the attack bots on system resources. I have banned the IP addresses of the vast majority of these attackers, but I notice a certain percentage do slip through with novel IP addresses, so this is a way of preventing them from forcing the server to load and interpret wp-login.php.
My 403 page consists of a mere 500-odd bytes with links intended to tempt bots to visit various spam-bot hells around the Internet, where they may encounter honeypots, investigators, bogus email addresses, bogus links, and in general waste a lot of their time and effort and generate no data of any use at all to them.
Tuesday, October 2, 2012
What Should I Put on My 403 Page?
Your 403 page (Forbidden) should be reserved for bad actors trying to hack your web site or spambots. I suggest sending them to harvester hells around the web, as in the code below. Any spambot that goes to those places may absorb bogus email addresses, get identified by honeypots, or waste time spinning their wheels.
The above is a harsh message to display to humans, so you had better be sure that it is not possible for an innocent user to accidentally trigger the 403 page. I soften the text for most of my web sites and make it civil, because there is a chance that some kind of unforeseen event could trigger a 403. However, if your web site has received a lot of hacker abuse in the past, then this wad of sputum may very well be what you want. I composed the message after one of my sites got hacked, an event which also caused me to devote many hours to learning about web site security.Post a Comment
by igor 04:20 4 replies by igor 09:32
Sunday, September 30, 2012
Hardening Wordpress: An Explanation
Wordpress recommends hardening security by deploying the code below in your .htaccess file:
Let us examine the code line-by-line until everything unique is explained. In the first place, any line beginning with # is a comment or remark statement, to us old-school programmers that cut our teeth on BASIC. A remark statement is intended for human comprehension in order to assist our feeble brains and is ignored by the Apache server.
Rewrite Engine On tells Apache, "Hey, start the engine, we have some rules on the way." Apache allocates resources in order to handle the rules.
RewriteBase / causes any evaluations that follow to assume the url (e.g., techlorebyigor.blogspot.com), in order to avoid having to specify the url on each and every condition and rule that follows.
RewriteRule ^wp-admin/includes/ - [F,L] scans for anyone attempting to access anything beginning with (denoted by ^) wp-admin/includes/, and the reaction will be [F,L] which means "Forbidden, and skip [L] all remaining rules." Forbidden means the users get the 403 page instead of their request on this one instance.
RewriteRule !^wp-includes/ - [S=3] is a special command in two ways. First, it uses NOT logic, denoted by the ! symbol. It instructs Apache that if the user's request does not begin with (^) "wp-includes/" then [S=3], which means skip the next three rules. S is like a GOTO statement providing a primitive form of IF...THEN logic, such as I have to use in my batch files. The reason this line is included is for speed of execution. If "wp-includes/" is not present in the request, then clearly the three rules that follow will not apply and by avoiding them, time is saved.
The other lines should be self-explanatory. The main area that I did not understand this morning was [S=3], and I did a bit of digging to unearth that information. The [S] command is not often seen and certainly optional in the above code, but such concern over efficiency is the mark of a good programmer.
I wonder whether I can replace the RewriteRules with RewriteConds for improved efficiency, but the [S=3] line makes me doubt whether the Condition statements would be more efficient after all.Post a Comment
# Block the include-only files.These are good rules, and I include them in my .htaccess, but I noticed there was no explanation offered for their effects, which complicates combining these rules with other rules. I prefer to understand what is going on, so I performed some research that I will now share with others.
RewriteEngine On
RewriteBase /
RewriteRule ^wp-admin/includes/ - [F,L]
RewriteRule !^wp-includes/ - [S=3]
RewriteRule ^wp-includes/[^/]+\.php$ - [F,L]
RewriteRule ^wp-includes/js/tinymce/langs/.+\.php - [F,L]
RewriteRule ^wp-includes/theme-compat/ - [F,L]
# BEGIN WordPress
Let us examine the code line-by-line until everything unique is explained. In the first place, any line beginning with # is a comment or remark statement, to us old-school programmers that cut our teeth on BASIC. A remark statement is intended for human comprehension in order to assist our feeble brains and is ignored by the Apache server.
Rewrite Engine On tells Apache, "Hey, start the engine, we have some rules on the way." Apache allocates resources in order to handle the rules.
RewriteBase / causes any evaluations that follow to assume the url (e.g., techlorebyigor.blogspot.com), in order to avoid having to specify the url on each and every condition and rule that follows.
RewriteRule ^wp-admin/includes/ - [F,L] scans for anyone attempting to access anything beginning with (denoted by ^) wp-admin/includes/, and the reaction will be [F,L] which means "Forbidden, and skip [L] all remaining rules." Forbidden means the users get the 403 page instead of their request on this one instance.
RewriteRule !^wp-includes/ - [S=3] is a special command in two ways. First, it uses NOT logic, denoted by the ! symbol. It instructs Apache that if the user's request does not begin with (^) "wp-includes/" then [S=3], which means skip the next three rules. S is like a GOTO statement providing a primitive form of IF...THEN logic, such as I have to use in my batch files. The reason this line is included is for speed of execution. If "wp-includes/" is not present in the request, then clearly the three rules that follow will not apply and by avoiding them, time is saved.
The other lines should be self-explanatory. The main area that I did not understand this morning was [S=3], and I did a bit of digging to unearth that information. The [S] command is not often seen and certainly optional in the above code, but such concern over efficiency is the mark of a good programmer.
I wonder whether I can replace the RewriteRules with RewriteConds for improved efficiency, but the [S=3] line makes me doubt whether the Condition statements would be more efficient after all.Post a Comment
by igor 04:20 4 replies by igor 09:32 0 comments
Wednesday, September 26, 2012
Tightening WordPress Security
I have developed a method to tighten Wordpress security for a scenario in which there is only one admin, and another admin will never be added.
Since my Wordpress blog only has 1 admin, there is no legitimate reason for any human being to ever access wp-signup.php. Anyone who does is hacking, so they should be banned forever.
Additional code, below, scans for common hack attacks against WordPress installations that I have observed firsthand in my server log. Again, because I do not permit registration, I ban it. Why permit something that no human being will ever use?
Other useful snippets, ubiquitous on the web and not original, follow:
#Ban WP attackersThis code in .htaccess will redirect anyone who attempts to sign-up to a bot-trap located in /kick/. An alternative for those who do not have a bot-trap installed would be to Redirect 403 /wp-signup.php. I recommend installing a bot-trap to allow your site to dynamically respond to attackers by banning their IP addresses. This will slow down hackers attempting to probe your site for vulnerabilities.
Redirect 301 /wp-signup.php /kick/
Additional code, below, scans for common hack attacks against WordPress installations that I have observed firsthand in my server log. Again, because I do not permit registration, I ban it. Why permit something that no human being will ever use?
RewriteCond %{QUERY_STRING} action=register [NC,OR]The question mark at the end of the above RewriteRule truncates any query string that was used, avoiding potential complications if the bot-trap is activated. Those bots that attempt to find exploits in WordPress plug-ins timthumb, uploadify, or marketplace will be banned. I do not use those plug-ins, however if you do, then you had better not use the above code.
RewriteCond %{REQUEST_URI} ^/timthumb [NC,OR]
RewriteCond %{REQUEST_URI} ^/uploadify [NC,OR]
RewriteCond %{REQUEST_URI} ^/marketplace [NC]
RewriteRule \.* http://techlorebyigor.blogspot.com/kick/? [R=301,L]
Other useful snippets, ubiquitous on the web and not original, follow:
<files wp-config.php>Although my wp-config.php is already locked down tight with a security of 400, I decided to add an additional layer of security in .htaccess. Why? Just because. Perhaps it is unnecessary, but I like it. There is no such thing as redundant security.
order allow,deny
deny from all
</files>
<FilesMatch "\.(htaccess|htpasswd|fla|psd|log|sh|gz|zip|tar)$">There is no legitimate reason for any human or bot to be reading any file with the above extensions. Although I won't ban any who do, I will show them my special 403 page which has many links to harvester-killers on other web sites.Post a Comment
Order Allow,Deny
Deny from all
</FilesMatch>
by igor 04:20 4 replies by igor 09:32 0 comments
Friday, July 15, 2011
A Few Words about Wordpress Security
A recent widespread attack that has damaged many Wordpress blogs exploited the file permission of wp-config.php. The permission for that file absolutely must be 400 or 440. Search for yassine edder on Google, a scum that is running an automated script out of Tunisia. The hacker I will henceforth call "Asinine" hacked a friend of mine, who was terrified of losing everything. I worked for three hours to analyze and then undo every last bit of the damage. But now I know some things about Wordpress security. And I have added tens of thousands of IP addresses in Tunisia to my blacklist, just in case Asinine hops over to a different cafe.
I cannot stress enough the importance of setting the file permission of wp-config.php. Lock it down tight. Don't delay, do it today.
No one, and I mean no one, should install Wordpress without first becoming very familiar with the security requirements. There are precautions that should be established prior to going public with a site. Setting the file permission of wp-config.php is #1 on the list. Until it is set in a proper manner, the site can be hacked by any idiot from here to Tunisia.
Make regular backups of your Wordpress site. I prefer using the excellent Snapshot Backup Plugin for Wordpress by Jay Versluis. I don't know whether he is any relation to the Versluis who created the excellent HV Menu, but such a connection can only be flattering. Indeed, the reason I downloaded the plugin was because of the name recognition.
I use .htaccess rules to secure the archive files on my Apache server. This will prevent unknown parties from downloading archive files, which remains a security risk until or unless the archive is deleted.
Copy and paste the following into the existing .htaccess in the wp-content directory or create .htaccess there if it does not already exist.
The above code uses a whitelisting strategy. Replace the IP address 111.222.333.444 with your own static IP address. The code will prevent anyone from downloading the .tar file--or any file with the text "sql", "old", "ini", "bak", "gz" or "log" in it, except for someone at the specificed IP address. If placed into the .htaccess in wp-content, it will control access for all files and directories within wp-content. It does not affect the parent of wp-content.
Another way to protect archives, instead of using a whitelist, would be to demand that the downloader enter a password. This is also possible to do in .htaccess, but I went with the whitelist, because it's more convenient for me.
Incidentally, the same whitelisting strategy is highly effective for the .htaccess located in the wp-admin directory. Do not allow anyone except one IP address to access the adminstration log-in. This will lock down security on your Wordpress site. Wards off brute-force attacks and other games hackers play. It could be adapted for sites with multiple admins, as long as the IP address of each admin is known and remains static. Could be a problem with a mobile admin, though!
I cannot stress enough the importance of setting the file permission of wp-config.php. Lock it down tight. Don't delay, do it today.
No one, and I mean no one, should install Wordpress without first becoming very familiar with the security requirements. There are precautions that should be established prior to going public with a site. Setting the file permission of wp-config.php is #1 on the list. Until it is set in a proper manner, the site can be hacked by any idiot from here to Tunisia.
Make regular backups of your Wordpress site. I prefer using the excellent Snapshot Backup Plugin for Wordpress by Jay Versluis. I don't know whether he is any relation to the Versluis who created the excellent HV Menu, but such a connection can only be flattering. Indeed, the reason I downloaded the plugin was because of the name recognition.
I use .htaccess rules to secure the archive files on my Apache server. This will prevent unknown parties from downloading archive files, which remains a security risk until or unless the archive is deleted.
Copy and paste the following into the existing .htaccess in the wp-content directory or create .htaccess there if it does not already exist.
The above code uses a whitelisting strategy. Replace the IP address 111.222.333.444 with your own static IP address. The code will prevent anyone from downloading the .tar file--or any file with the text "sql", "old", "ini", "bak", "gz" or "log" in it, except for someone at the specificed IP address. If placed into the .htaccess in wp-content, it will control access for all files and directories within wp-content. It does not affect the parent of wp-content.
Another way to protect archives, instead of using a whitelist, would be to demand that the downloader enter a password. This is also possible to do in .htaccess, but I went with the whitelist, because it's more convenient for me.
Incidentally, the same whitelisting strategy is highly effective for the .htaccess located in the wp-admin directory. Do not allow anyone except one IP address to access the adminstration log-in. This will lock down security on your Wordpress site. Wards off brute-force attacks and other games hackers play. It could be adapted for sites with multiple admins, as long as the IP address of each admin is known and remains static. Could be a problem with a mobile admin, though!
I wonder who traxodone@gmail.com is? That individual sent me an email mere hours after I had posted this:
Hi Igor,
I've find your blog through Google and I hope you can help my. My blog is hacked by this guy from Tunisia, how can I restore my blog and password for wp admin?
Kind regards,
traxodone
I wrote back asking for more information, such as the blog ID and some reasons I should volunteer my assistance. No response. Well, I can't help anybody that does not communicate. Said individual may well be the hacker responsible for the attacks.
Saturday, April 30, 2011
Using .htaccess to Redirect Renamed or Deleted Web Pages, Forums, and Directories
A web admin may have a good reason for renaming the stray page or directory on a web site, but doing so impacts search engine optimization. One wouldn't wish to greet visitors or search engines accessing old links with a clumsy 404 page. An elegant solution is to redirect access attempts to a new page. For a simple rename, this is easy enough:
My solution is to use Apache's
redirect 301 /Atheism.html /atheism.html
Any hits on Atheism.html will be redirected to atheism.html. This is clear, economical code, the best choice for a simple scenario like that. But what about the case of deleted forums with variable links, such as /messages/techforum/posts/
, followed by many links such as 2009/12/75433.html
? A common desire is to redirect many links in a directory to a single directory or file. The 301 technique no longer serves in that scenario. My solution is to use Apache's
RewriteCond
command, which requires the following line somewhere near the beginning of your .htaccess file:RewriteEngine on
With that prerequisite in place, one can then use RewriteCond
and RewriteRule
to redirect requests for /messages/techforum/post/2009/12/75433.html, for instance, to notice.html:RewriteCond %{REQUEST_URI} /messages/
RewriteRule ^(.*)$ /notice.html [R=301,L]
REQUEST_URI is an environmental variable with the specific page accessed on the web site, excluding the web site's base url. The RewriteRule drops all text after the base url and replaces it with notice.html, which then should appear on the user's browser.
IP Addresses of Spammers, Hackers, Leaches and Bothersome Bots
This is the list of banned IP addresses for the web site I admin. It is the product of countless hours pouring over my server logs and observing activity from the addresses below. Pay no mind to the date of this blog post. I update this post periodically with my latest blacklist in use on the web sites I admin.
Not all IP addresses listed below are malicious hackers or spammers. I'm an aggressive admin that takes a dim view of bots other than the big three (Google, Bing/MSN, Yahoo) and social media sites like StumbleUpon, Digg and Facebook.
My list bans bots that may be content scrapers or researchers. If they're not human and not related to a major search engine, I don't care what they are. Rule of thumb--if my site gains nothing from the transaction and if the transaction isn't a human being on a browser, ban it as a waste of system resource and a potential liability. At the very least their leaching could reduce the response time of my site for a legitimate human user.
Some humans were observed engaged in non-browser activity, in some cases attempts at modifying or exploiting our site. If a human user isn't using a browser, but an automated script seeking to exploit the site, then they do not need to visit the site at all.
My many and varied bot-traps nail several IP addresses on a weekly basis, but sometimes just for fun and recreation, I scan for bots with my own eyes. My hands-on methodology of banning IP addresses is to examine my server's log file and note the obvious non-human activity. I identify bots based upon their actions and other criteria which I will not reveal here, but anyone who has looked at a raw server log should understand what I am talking about. It is not difficult to determine who is human and who is a bot, because bots are typically very stupid and, well, robot-like in their behavior, which should not come as a surprise. The authors of these bots are not terribly bright either, judging by what I have seen. When I have identified a single IP address, then that is not the end of the story. I want to know if it belongs to an entire neighborhood which is full of bad bots, because this is often the case in my experience, and why ban one single IP when one can ban 255 or more? It is more efficient to shut down bad IP ranges than play the game of one IP address at a time, which could result in a truly massive .htaccess in the end. I research Project Honeypot (I have been a registered user of that site for years and even authored a dark .css style for the web site) and related sites to determine the true nature of an IP address and the neighborhood in which it resides.
I take a dim view of Russia and China. For one thing, these states do not value freedom of speech and do not enshrine democratic principles. Also, our site is English, and I don't envision attracting any Russian or Chinese fans in this lifetime. Unlike some admins, I don't ban the entire region. But if I notice a large number of spammers in an IP address neighborhood in Russia or China, I don't hesitate banning the entire neighborhood rather than bother with singling out individual IP addresses. I'm also skeptical of Poland and the Ukraine based on the many bad bots I've witnessed from those nations. Among all nations, I give China short shrift, because of its poor reputation and poor compliance with standards. When you look up an IP address with WHOIS, Chinese accounts sometimes don't specify a small network neighborhood, but offer a range that may encompass millions of IP addresses rather than just hundreds or thousands. Where China is concerned, if I detect numerous bad bots in a general neighborhood, and China refuses to specify a proper narrow range in WHOIS, then I will not hesitate to ban the millions. With other countries, I take a more refined approach, depending upon my evaluation of the level of corruption in their government. I am reluctant to ban entire neighborhoods in democratic nations, but if an IP range is reserved exclusively for web hosting or what is termed "Direct Allocation," then that is a red flag.
I recommend the .htaccess security measures found on Perishable Press. The .htaccess code found on Perishable Press requires testing and adjustment for specific sites, because it is more complex than a mere Deny-from. However, Jeff's code is valuable and well-worth the effort of refining, because it adds another layer of protection in case attackers slip past the Deny-from barrier. I have integrated some of the intel from his excellent "5G Blacklist 2012" into my own .htaccess, but I do not include those bits here, because the credit for that is his, not mine, and he updates it and has the best version. I recommend visiting his site and merging his intel with mine to make a synthesis. An admin that does so may notice a reduction in leach and spam traffic. My .htaccess is a synthesis of my own work with that of Perishable Press. As for Wizcraft, which I used to recommend, I do not use his intel any longer, because he does not respond to emails, and some of his deny-froms have proven problematic, causing technical issues with Wordpress, for instance. I only use Perishable Press intel and my own. My code is in use on a Wordpress site right now.
Bear in mind that intel is perishable with the changing nature of the Internet, and I'm not sure what the shelf life is. I'll try and keep it updated periodically, but there are no guarantees. Some admins grant a reprieve to addresses on an annual basis. They worry that spammers will relocate, and the IP addresses may then become legit. I don't feel like that is a worthwhile concern for my purposes. I would rather continue blocking address ranges that have a lengthy history of hosting spammers and hackers. Let's say a spammer abandons or gets booted from an IP. The hosting company will rent the IP to somebody else, and chances are that sooner or later they will rent it to another spammer, because they were lax enough to rent to one spammer and are likely to make the same mistake again in pursuit of the almighty dollar. The only good IP addresses are those in use by end-users of an Internet Service Provider and major search engines such as Google. Anything else is questionable at best and likely to be used by spammers sooner or later.
You can pop this baby right into your .htaccess file if you admin an Apache server. Beware of Blogger's word wrap, though, which sometimes creates syntax errors in copied and pasted code. Note that the last Order directive in an .htaccess file will be the one used. If you already have a Limit paragraph in your .htaccess, add the Deny's to it, but use only one Limit container.
If you would like to hire me to harden the security of your WordPress blog, my one-time flat fee is $250. I can do all of the things described here and a bit more that is not. I require a response to emailed questions, payment in advance, and for twenty-four hours, admin access as well as Secure Ftp (SFTP) access to your site. As of 2012, I no longer offer this service, because I do not have the spare time to deal with new customers.
Not all IP addresses listed below are malicious hackers or spammers. I'm an aggressive admin that takes a dim view of bots other than the big three (Google, Bing/MSN, Yahoo) and social media sites like StumbleUpon, Digg and Facebook.
My list bans bots that may be content scrapers or researchers. If they're not human and not related to a major search engine, I don't care what they are. Rule of thumb--if my site gains nothing from the transaction and if the transaction isn't a human being on a browser, ban it as a waste of system resource and a potential liability. At the very least their leaching could reduce the response time of my site for a legitimate human user.
Some humans were observed engaged in non-browser activity, in some cases attempts at modifying or exploiting our site. If a human user isn't using a browser, but an automated script seeking to exploit the site, then they do not need to visit the site at all.
My many and varied bot-traps nail several IP addresses on a weekly basis, but sometimes just for fun and recreation, I scan for bots with my own eyes. My hands-on methodology of banning IP addresses is to examine my server's log file and note the obvious non-human activity. I identify bots based upon their actions and other criteria which I will not reveal here, but anyone who has looked at a raw server log should understand what I am talking about. It is not difficult to determine who is human and who is a bot, because bots are typically very stupid and, well, robot-like in their behavior, which should not come as a surprise. The authors of these bots are not terribly bright either, judging by what I have seen. When I have identified a single IP address, then that is not the end of the story. I want to know if it belongs to an entire neighborhood which is full of bad bots, because this is often the case in my experience, and why ban one single IP when one can ban 255 or more? It is more efficient to shut down bad IP ranges than play the game of one IP address at a time, which could result in a truly massive .htaccess in the end. I research Project Honeypot (I have been a registered user of that site for years and even authored a dark .css style for the web site) and related sites to determine the true nature of an IP address and the neighborhood in which it resides.
I take a dim view of Russia and China. For one thing, these states do not value freedom of speech and do not enshrine democratic principles. Also, our site is English, and I don't envision attracting any Russian or Chinese fans in this lifetime. Unlike some admins, I don't ban the entire region. But if I notice a large number of spammers in an IP address neighborhood in Russia or China, I don't hesitate banning the entire neighborhood rather than bother with singling out individual IP addresses. I'm also skeptical of Poland and the Ukraine based on the many bad bots I've witnessed from those nations. Among all nations, I give China short shrift, because of its poor reputation and poor compliance with standards. When you look up an IP address with WHOIS, Chinese accounts sometimes don't specify a small network neighborhood, but offer a range that may encompass millions of IP addresses rather than just hundreds or thousands. Where China is concerned, if I detect numerous bad bots in a general neighborhood, and China refuses to specify a proper narrow range in WHOIS, then I will not hesitate to ban the millions. With other countries, I take a more refined approach, depending upon my evaluation of the level of corruption in their government. I am reluctant to ban entire neighborhoods in democratic nations, but if an IP range is reserved exclusively for web hosting or what is termed "Direct Allocation," then that is a red flag.
I recommend the .htaccess security measures found on Perishable Press. The .htaccess code found on Perishable Press requires testing and adjustment for specific sites, because it is more complex than a mere Deny-from. However, Jeff's code is valuable and well-worth the effort of refining, because it adds another layer of protection in case attackers slip past the Deny-from barrier. I have integrated some of the intel from his excellent "5G Blacklist 2012" into my own .htaccess, but I do not include those bits here, because the credit for that is his, not mine, and he updates it and has the best version. I recommend visiting his site and merging his intel with mine to make a synthesis. An admin that does so may notice a reduction in leach and spam traffic. My .htaccess is a synthesis of my own work with that of Perishable Press. As for Wizcraft, which I used to recommend, I do not use his intel any longer, because he does not respond to emails, and some of his deny-froms have proven problematic, causing technical issues with Wordpress, for instance. I only use Perishable Press intel and my own. My code is in use on a Wordpress site right now.
Bear in mind that intel is perishable with the changing nature of the Internet, and I'm not sure what the shelf life is. I'll try and keep it updated periodically, but there are no guarantees. Some admins grant a reprieve to addresses on an annual basis. They worry that spammers will relocate, and the IP addresses may then become legit. I don't feel like that is a worthwhile concern for my purposes. I would rather continue blocking address ranges that have a lengthy history of hosting spammers and hackers. Let's say a spammer abandons or gets booted from an IP. The hosting company will rent the IP to somebody else, and chances are that sooner or later they will rent it to another spammer, because they were lax enough to rent to one spammer and are likely to make the same mistake again in pursuit of the almighty dollar. The only good IP addresses are those in use by end-users of an Internet Service Provider and major search engines such as Google. Anything else is questionable at best and likely to be used by spammers sooner or later.
You can pop this baby right into your .htaccess file if you admin an Apache server. Beware of Blogger's word wrap, though, which sometimes creates syntax errors in copied and pasted code. Note that the last Order directive in an .htaccess file will be the one used. If you already have a Limit paragraph in your .htaccess, add the Deny's to it, but use only one Limit container.
Subscribe to:
Posts (Atom)
techlorebyigor is my personal journal for ideas & opinions