Blocking robots using robots.txt or IP addresses are both bad ideas.
Bad robots generally do not pay attention to robots.txt.
Blocking IP addresses as some have suggested has all kinds repercussions.
The following suggestings should help …..
Normally bots will not change thier name very often so use the following in you
.htaccess file in your root directory and deny all from inner directories except for you local ips.
Using ModRewrite {Apache}
If the string or regular expression matches the user-agent HTTP header it will send them to a forbidden page
RewriteCond %{HTTP_USER_AGENT} ^Twiceler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider
RewriteRule ^.* - [F,L]
you can change the RewriteRule and send them somewhere else like a non-linked page
that records hits and user-agents therefore letting you know how many bad bots are taking
the bait! You will have to use PHP and MySQL if you do not want to save it in a file.
If you do not have ModRewrite the following list should help and I believe that the following method uses less server resources.
SetEnvIfNoCase user-agent “^Baiduspider” bad_bot=1
SetEnvIfNoCase user-agent “^BlackWidow” bad_bot=1
SetEnvIfNoCase user-agent “^Bot\ mailto:craftbot@yahoo.com” bad_bot=1
SetEnvIfNoCase user-agent “^ChinaClaw” bad_bot=1
SetEnvIfNoCase user-agent “^Custo” bad_bot=1
SetEnvIfNoCase user-agent “^DISCo” bad_bot=1
SetEnvIfNoCase user-agent “^Download\ Demon” bad_bot=1
SetEnvIfNoCase user-agent “^eCatch” bad_bot=1
SetEnvIfNoCase user-agent “^EirGrabber” bad_bot=1
SetEnvIfNoCase user-agent “^EmailSiphon” bad_bot=1
SetEnvIfNoCase user-agent “^EmailWolf” bad_bot=1
SetEnvIfNoCase user-agent “^Express\ WebPictures” bad_bot=1
SetEnvIfNoCase user-agent “^ExtractorPro” bad_bot=1
SetEnvIfNoCase user-agent “^EyeNetIE” bad_bot=1
SetEnvIfNoCase user-agent “^FlashGet” bad_bot=1
SetEnvIfNoCase user-agent “^GetRight” bad_bot=1
SetEnvIfNoCase user-agent “^GetWeb!” bad_bot=1
SetEnvIfNoCase user-agent “^Go!Zilla” bad_bot=1
SetEnvIfNoCase user-agent “^Go-Ahead-Got-It” bad_bot=1
SetEnvIfNoCase user-agent “^GrabNet” bad_bot=1
SetEnvIfNoCase user-agent “^Grafula” bad_bot=1
SetEnvIfNoCase user-agent “^HMView” bad_bot=1
SetEnvIfNoCase user-agent “HTTrack†bad_bot=1
SetEnvIfNoCase user-agent “^Image\ Stripper” bad_bot=1
SetEnvIfNoCase user-agent “^Image\ Sucker” bad_bot=1
SetEnvIfNoCase user-agent “Indy\ Library” [NC,OR]
SetEnvIfNoCase user-agent “^InterGET” bad_bot=1
SetEnvIfNoCase user-agent “^Internet\ Ninja” bad_bot=1
SetEnvIfNoCase user-agent “^JetCar” bad_bot=1
SetEnvIfNoCase user-agent “^JOC\ Web\ Spider” bad_bot=1
SetEnvIfNoCase user-agent “^larbin” bad_bot=1
SetEnvIfNoCase user-agent “^LeechFTP” bad_bot=1
SetEnvIfNoCase user-agent “^Mass\ Downloader” bad_bot=1
SetEnvIfNoCase user-agent “^MIDown\ tool” bad_bot=1
SetEnvIfNoCase user-agent “^Mister\ PiX” bad_bot=1
SetEnvIfNoCase user-agent “^Navroad” bad_bot=1
SetEnvIfNoCase user-agent “^NearSite” bad_bot=1
SetEnvIfNoCase user-agent “^NetAnts” bad_bot=1
SetEnvIfNoCase user-agent “^NetSpider” bad_bot=1
SetEnvIfNoCase user-agent “^Net\ Vampire” bad_bot=1
SetEnvIfNoCase user-agent “^NetZIP” bad_bot=1
SetEnvIfNoCase user-agent “^Octopus” bad_bot=1
SetEnvIfNoCase user-agent “^Offline\ Explorer” bad_bot=1
SetEnvIfNoCase user-agent “^Offline\ Navigator” bad_bot=1
SetEnvIfNoCase user-agent “^PageGrabber” bad_bot=1
SetEnvIfNoCase user-agent “^Papa\ Foto” bad_bot=1
SetEnvIfNoCase user-agent “^pavuk” bad_bot=1
SetEnvIfNoCase user-agent “^pcBrowser” bad_bot=1
SetEnvIfNoCase user-agent “^RealDownload” bad_bot=1
SetEnvIfNoCase user-agent “^ReGet” bad_bot=1
SetEnvIfNoCase user-agent “^SiteSnagger” bad_bot=1
SetEnvIfNoCase user-agent “^SmartDownload” bad_bot=1
SetEnvIfNoCase user-agent “^SuperBot” bad_bot=1
SetEnvIfNoCase user-agent “^SuperHTTP” bad_bot=1
SetEnvIfNoCase user-agent “^Surfbot” bad_bot=1
SetEnvIfNoCase user-agent “^tAkeOut” bad_bot=1
SetEnvIfNoCase user-agent “^Teleport\ Pro” bad_bot=1
SetEnvIfNoCase user-agent “^VoidEYE” bad_bot=1
SetEnvIfNoCase user-agent “^Web\ Image\ Collector” bad_bot=1
SetEnvIfNoCase user-agent “^Web\ Sucker” bad_bot=1
SetEnvIfNoCase user-agent “^WebAuto” bad_bot=1
SetEnvIfNoCase user-agent “^WebCopier” bad_bot=1
SetEnvIfNoCase user-agent “^WebFetch” bad_bot=1
SetEnvIfNoCase user-agent “^WebGo\ IS” bad_bot=1
SetEnvIfNoCase user-agent “^WebLeacher” bad_bot=1
SetEnvIfNoCase user-agent “^WebReaper” bad_bot=1
SetEnvIfNoCase user-agent “^WebSauger” bad_bot=1
SetEnvIfNoCase user-agent “^Website\ eXtractor” bad_bot=1
SetEnvIfNoCase user-agent “^Website\ Quester” bad_bot=1
SetEnvIfNoCase user-agent “^WebStripper” bad_bot=1
SetEnvIfNoCase user-agent “^WebWhacker” bad_bot=1
SetEnvIfNoCase user-agent “^WebZIP” bad_bot=1
SetEnvIfNoCase user-agent “^Widow” bad_bot=1
SetEnvIfNoCase user-agent “^WWWOFFLE” bad_bot=1
SetEnvIfNoCase user-agent “^Xaldon\ WebSpider” bad_bot=1
SetEnvIfNoCase user-agent “^Zeus” bad_bot=1
Order Allow,Deny
Allow from all
Deny from env=bad_bot
Please comment on this issue with suggestions so that we might expand it.




