自己網站的ROBOTS.TXT屏蔽的記錄,以及一些代碼和示例:
屏蔽后臺目錄,為了安全,做雙層管理后臺目錄/a/xxxx/,蜘蛛屏蔽/a/,既不透露后臺路徑,也屏蔽蜘蛛爬后臺目錄
緩存,阻止蜘蛛爬靜態緩存文件
下載,阻止蜘蛛爬下載目錄,若無用,刪除下載目錄
編輯器,阻止蜘蛛爬編輯器,也防止編輯器目錄被發現產生安全隱患
郵件,阻止蜘蛛爬靜態郵件模板
其他頁面,無收錄價值頁面屏蔽
圖片,阻止蜘蛛爬除JPG/jpg類文件之外的任何類型圖片
核心文件目錄,阻止蜘蛛直接爬include及其子目錄(函數/類庫/模型/模板等)
媒體目錄,阻止爬播放類型媒體目錄,若無用,刪除該目錄
附加參數頁面,阻止蜘蛛爬帶參數的頁面
RAR ZIP GZ文件類型
無效蜘蛛、惡意蜘蛛屏蔽
指定sitemap.xml位置
?
目錄屏蔽:
User-agent: *
Disallow: /a/
Disallow: /cache/
Disallow: /download/
Disallow: /editors/
Disallow: /email/
Disallow: /extras/
Disallow: /images/
Disallow: /includes/
Disallow: /media/
Disallow: /pub/
Disallow: /nddbc.html
Disallow: /page_not_found.php
Disallow: /login.html
Disallow: /privacy.html
Disallow: /conditions.html
Disallow: /contact_us.html
Disallow: /gv_faq.html
Disallow: /discount_coupon.html
Disallow: /unsubscribe.html
Disallow: /shopping_cart.html
Disallow: /ask_a_question.html
Disallow: /popup_image_additional.html
Disallow: /product_reviews_write.html
Disallow: /tell_a_friend.html
Disallow: /pages-popup_image.html
Disallow: /popup_image_additional.html
Disallow: /login.html
?
阻止蜘蛛爬非jpg圖片(限制產品圖片格式為jpg)
User-agent: Googlebot
Allow: .jpg$
Disallow: .jpeg$
Disallow: .gif$
Disallow: .png$
Disallow: .bmp$
?
阻止蜘蛛爬壓縮文件
User-agent: *
Disallow: .zip$
Disallow: .rar$
Disallow: .gz$
Disallow: .tar $
?
?
制定sitemap地址
Sitemap: http://www.xxx.jp/sitemap.xml
?
其他無效蜘蛛、惡意蜘蛛屏蔽:
User-Agent: almaden
Disallow: /
User-Agent: ASPSeek
Disallow: /
User-Agent: Axmo
Disallow: /
User-Agent: BaiduSpider
Disallow: /
User-Agent: booch
Disallow: /
User-Agent: DTS Agent
Disallow: /
User-Agent: Downloader
Disallow: /
User-Agent: EmailCollector
Disallow: /
User-Agent: EmailSiphon
Disallow: /
User-Agent: EmailWolf
Disallow: /
User-Agent: Expired Domain Sleuth
Disallow: /
User-Agent: Franklin Locator
Disallow: /
User-Agent: Gaisbot
Disallow: /
User-Agent: grub
Disallow: /
User-Agent: HughCrawler
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: lcabotAccept
Disallow: /
User-Agent: IconSurf
Disallow: /
User-Agent: Iltrovatore-Setaccio
Disallow: /
User-Agent: Indy Library
Disallow: /
User-Agent: IUPUI
Disallow: /
User-Agent: Kittiecentral
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: larbin
Disallow: /
User-Agent: lwp-trivial
Disallow: /
User-Agent: MetaTagRobot
Disallow: /
User-Agent: Missigua Locator
Disallow: /
User-Agent: NetResearchServer
Disallow: /
User-Agent: NextGenSearch
Disallow: /
User-Agent: NPbot
Disallow: /
User-Agent: Nutch
Disallow: /
User-Agent: ObjectsSearch
Disallow: /
User-Agent: Oracle Ultra Search
Disallow: /
User-Agent: PEERbot
Disallow: /
User-Agent: PictureOfInternet
Disallow: /
User-Agent: PlantyNet
Disallow: /
User-Agent: QuepasaCreep
Disallow: /
User-Agent: ScSpider
Disallow: /
User-Agent: SOFT411
Disallow: /
User-Agent: spider.acont.de
Disallow: /
User-Agent: Sqworm
Disallow: /
User-Agent: SSM Agent
Disallow: /
User-Agent: TAMU
Disallow: /
User-Agent: TheUsefulbot
Disallow: /
User-Agent: TurnitinBot
Disallow: /
User-Agent: Tutorial Crawler
Disallow: /
User-Agent: TutorGig
Disallow: /
User-Agent: WebCopier
Disallow: /
User-Agent: WebZIP
Disallow: /
User-Agent: ZipppBot
Disallow: /
User-Agent: Xenu
Disallow: /
User-Agent: Wotbox
Disallow: /
User-Agent: Wget
Disallow: /
User-Agent: NaverBot
Disallow: /
User-Agent: mozDex
Disallow: /
User-Agent: Sosospider
Disallow: /
User-Agent: Baidupider
Disallow: /
?