pyspider 0.3.3

A Powerful Spider System in Python

Homepage: https://github.com/binux/pyspider

Platform: Pypi

Language: Python

License: Apache-2.0

View on registry: https://pypi.python.org/pypi/pyspider/


pyspider Build Status Coverage Status Try

A Powerful Spider(Web Crawler) System in Python. TRY IT NOW!

  • Write script in python with powerful API
  • Python 2&3
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • Javascript pages supported!
  • MySQL, MongoDB, SQLite, PostgreSQL as database backend
  • Task priority, retry, periodical, recrawl by age and more
  • Distributed architecture

Documentation: http://docs.pyspider.org/
Tutorial: http://docs.pyspider.org/en/latest/tutorial/

Sample Code

from pyspider.libs.base_handler import *
class Handler(BaseHandler):
 crawl_config = {
 }
 @every(minutes=24 * 60)
 def on_start(self):
 self.crawl('http://scrapy.org/', callback=self.index_page)
 @config(age=10 * 24 * 60 * 60)
 def index_page(self, response):
 for each in response.doc('a[href^="http"]').items():
 self.crawl(each.attr.href, callback=self.detail_page)
 def detail_page(self, response):
 return {
 "url": response.url,
 "title": response.doc('title').text(),
 }

Demo

Installation

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/

Contribute

TODO

v0.4.0

  • local mode, load script from file.
  • works as a framework (all components running in one process, no threads)
  • redis
  • shell mode like scrapy shell
  • a visual scraping interface like portia

more

License

Licensed under the Apache License, Version 2.0

веселые картинки развлекательные гифки интресные факты смешные видео смешные истории из соцсетей

GitHub Repository

binux/pyspider binux/pyspider

A Powerful Spider(Web Crawler) System in Python.

http://docs.pyspider.org/

Language: Python

Created: February 21, 2014 19:18

Last updated: March 31, 2015 04:15

Last pushed: March 30, 2015 15:41

Size: 4.63 MB

Stars: 4,225

Forks: 857

Watchers: 368

Open issues: 25

Top Contributors

Roy Binux tiancheng91 laapsaap piglei imlonghao zc eiriksm Dody Suria Wijaya e-dorigatti

Releases


Related Projects

aio2gis
asyncio-powered 2gis library for Python
Pypi - Python - BSD-3-Clause - Published 3 days ago
dynpy 0.2.1
Dynamical systems for Python
Pypi - C - GPL-2.0+ - Published 5 months ago - 1 stars
makeobj 0.8
Powerful Enumeration System
Pypi - Python - BSD-3-Clause - Updated about 2 years ago - 1 stars
pypstools 0.0.2
Tools for power system studies tools
Pypi - MIT - Published 8 months ago
pysd 0.2.1
System Dynamics Modeling in Python
Pypi - Python - Other - Updated 4 days ago - 5 stars
веселые картинки развлекательные гифки интресные факты смешные видео смешные истории из соцсетей