pyspider爬虫学习-文档翻译-index.md
博客专区 > sijinge 的博客 > 博客详情
pyspider爬虫学习-文档翻译-index.md
sijinge 发表于2个月前
pyspider爬虫学习-文档翻译-index.md
  • 发表于 2个月前
  • 阅读 19
  • 收藏 0
  • 点赞 0
  • 评论 0

腾讯云 十分钟定制你的第一个小程序>>>   

摘要: pyspider 简介
pyspider [![Build Status][Build Status]][Travis CI] [![Coverage Status][Coverage Status]][Coverage] [![Try][Try]][Demo]
========
一个基于Python的强大蜘蛛(网络爬虫)系统
A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]**

- Write script in Python #Python 编写脚本
- Powerful WebUI with script editor, task monitor, project manager and result viewer # 强大的WebUI实现脚本编写,任务监控,项目管理,结果展示
- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend #支持mysql,mongodb,Redis,SQLite,Elasticsearch,PostgreSQL,SQLAlchemy等多种数据库
- [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue #支持RabbitMQ,Beanstalk,Redis等多种消息队列
- Task priority, retry, periodical, recrawl by age, etc... #任务优先级,重试,定期,按年龄从爬等等
- Distributed architecture, Crawl Javascript pages, Python 2&3, etc... #分布式架构,js网页爬取,Python 2&3 等等

Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/)  #教程
Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/)  #文档
Release notes: [https://github.com/binux/pyspider/releases](https://github.com/binux/pyspider/releases)  #发布说明

Sample Code #实例代码
-----------

```python
from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }
```

[![Demo][Demo Img]][Demo]


Installation #安装
------------

* `pip install pyspider`
* run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/)
#快速开始
Quickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/)

#贡献
Contribute
----------

* Use It #使用它
* Open [Issue], send PR #发现[问题],并公开
* [User Group] #用户组
* [中文问答](http://segmentfault.com/t/pyspider)


TODO 备忘录
----

### v0.4.0

- [x] local mode, load script from file. #本地模式,从文件加载脚本
- [x] works as a framework (all components running in one process, no threads) #作为一个框架(所有组件都在一个进程中运行,没有其它线程)
- [x] redis 
- [x] shell mode like `scrapy shell`  #shell模式就像'scrapy shell'
- [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia)#像[portia]一样的视觉抓取界面


### more 更多

- [x] edit script with vim via [WebDAV](http://en.wikipedia.org/wiki/WebDAV)#通过[WebDAV]编辑vim脚本


License 许可证
-------
Licensed under the Apache License, Version 2.0 #在Apache许可下的2.0版本


[Build Status]:         https://img.shields.io/travis/binux/pyspider/master.svg?style=flat #构建状态
[Travis CI]:            https://travis-ci.org/binux/pyspider #
[Coverage Status]:      https://img.shields.io/coveralls/binux/pyspider.svg?branch=master&style=flat #覆盖状态
[Coverage]:             https://coveralls.io/r/binux/pyspider #覆盖
[Try]:                  https://img.shields.io/badge/try-pyspider-blue.svg?style=flat #尝试
[Demo]:                 http://demo.pyspider.org/#实例
[Demo Img]:             imgs/demo.png #实例图片
[Issue]:                https://github.com/binux/pyspider/issues #问题
[User Group]:           https://groups.google.com/group/pyspider-users #用户群
标签: pyspider 爬虫
共有 人打赏支持
粉丝 0
博文 35
码字总数 32746
×
sijinge
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: