« 雪豹项目喜获“SEE·TNC生态奖 ” | Main | freebsd常用命令 »

checkbot 学习笔记

# checkbot --help
Checkbot 1.79 command line options:

--cookies Accept cookies from the server
--debug Debugging mode: No pauses, stop after 25 links.
--file file Use file as basis for output file names.
--help Provide this message.
--mailto address Mail brief synopsis to address when done.
--noproxy domains Do not proxy requests to given domains.
--verbose Verbose mode: display many messages about progress.
--url url Start URL
--match match Check pages only if URL matches `match'
If no match is given, the start URL is used as a match
--exclude exclude Exclude pages if the URL matches 'exclude'
--filter regexp Run regexp on each URL found
--ignore ignore Ignore URLs matching 'ignore'
--suppress file Use contents of 'file' to suppress errors in output
--note note Include Note (e.g. URL to report) along with Mail message.
--proxy URL URL of proxy server for HTTP and FTP requests.
--internal-only Only check internal links, skip checking external links.
--sleep seconds Sleep this many seconds between requests (default 0)
--style url Reference the style sheet at this URL.
--timeout seconds Timeout for http requests in seconds (default 120)
--interval seconds Maximum time interval between updates (default 10800)
--dontwarn codes Do not write warnings for these HTTP response codes
--enable-virtual Use only virtual names, not IP numbers for servers
--language Specify 2-letter language code for language negotiation

SYNOPSIS
checkbot [--cookies] [--debug] [--file file name] [--help] [--mailto email addresses] [--noproxy list of domains] [--verbose] [--url start URL] [--match match string] [--exclude exclude string] [--proxy proxy URL] [--internal-only] [--ignore ignore string] [--filter substitution regular expression] [--style style file URL] [--note note] [--sleep seconds] [--timeout timeout] [--interval seconds] [--dontwarn HTTP responde codes] [--enable-virtual] [--language language code] [--suppress suppression file] [start URLs]

[--cookies] 是否从服务器接受 cookies 默认是不接受的
[--debug] 调试模式,不是真正在工作了,但会保留一点文件并且被删除。
[--file file name]输出报告页面到某个目录下的某个文件,一般默认是checkbot.html,方便观察。
[--help]帮助命令
[--mailto email addresses]发送报告文件到某个EMAIL地址。
[--noproxy list of domains] 不使用代理的域名,可以使用逗号分开。
[--verbose]运行的时候显示输出,包括环链检查和报告等。
[--url start URL]开始URL。
[--match match string]设置本地链接选项。
[--exclude exclude string]排除选项。
[--proxy proxy URL]使用代理服务器访问网络。
[--internal-only]跳过外部检查尽检查内部链接。
[--ignore ignore string] 使用正则表达式匹配不检查的URL,用于网址过滤。
[--filter substitution regular expression]这个选项定义了字符串过滤 ,这是一个perl的表达式。这种过滤器是运行在每个网址基准上的,因此重写url,然后进入队列进行检查。
[--style style file URL]定义一个风格文件URL,使输出报告更加符合您的网页设计风格。
[--note note]说明包括逐字邮件讯息。
[--sleep seconds]再次请求的延时时间,单位为秒。减少带宽使用。
[--timeout timeout]超时时间,设置多长时间如果没有反馈为超时。
[--interval seconds]网页报告更新时间,默认为3小时,开始的时候每隔一分钟扩大更新时间直到3小时。
[--dontwarn HTTP responde codes]对于某些服务器响应不产生错误报告。如301、404。
[--enable-virtual]这个选项是使爬虫能处理虚拟主机。
[--language language code]语言编码。
[--suppress suppression file]错误代码和URL阻止警告。

以上翻译由于水平有限,请不要转载。如果转载请保持本站链接:
原文出处:http://www.gccgle.com/2007/09/checkbot.html

请尊重作者劳动

TrackBack

TrackBack URL for this entry:
http://www.gccgle.com/cgi-bin/mt/mt-tb.cgi/8

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on September 05, 2007 09:48 PM.

The previous post in this blog was 雪豹项目喜获“SEE·TNC生态奖 ”.

The next post in this blog is freebsd常用命令.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34