GitXplorerGitXplorer
r

scrapy-inline-requests

public
110 stars
27 forks
10 issues

Commits

List of commits on branch master.
Unverified
2cbbb66e6e97260b7e126aa9d8ecde1393a554c9

TST: Fixed test not behaving as expected.

rrmax committed 8 years ago
Unverified
1a87a7ce5f4aacb7e6e0061265fe2a1372617cdd

Merge pull request #7 from chekunkov/patch-1

rrolando committed 9 years ago
Unverified
c69edd16719ef7ac1ce80fa46d5cb0f69c080b31

Typo in Value error

cchekunkov committed 9 years ago
Unverified
e738650afd0b94fee5d60eed29b3c0ef5f235e2b

REL: Bump version to 0.3.1.

rrmax committed 9 years ago
Unverified
50eecd523f288454a8e7080f46c7befbf572b77f

MAINT: Fix bumpversion current value.

rrmax committed 9 years ago
Unverified
dbc4ecf043140780ca0c082c644c5b665f19c443

MAINT: Update project boilerplate files improvements.

rrmax committed 9 years ago

README

The README file for this repository.

====================== Scrapy Inline Requests

.. image:: https://img.shields.io/pypi/v/scrapy-inline-requests.svg :target: https://pypi.python.org/pypi/scrapy-inline-requests

.. image:: https://img.shields.io/pypi/pyversions/scrapy-inline-requests.svg :target: https://pypi.python.org/pypi/scrapy-inline-requests

.. image:: https://readthedocs.org/projects/scrapy-inline-requests/badge/?version=latest :target: https://readthedocs.org/projects/scrapy-inline-requests/?badge=latest :alt: Documentation Status

.. image:: https://img.shields.io/travis/rolando/scrapy-inline-requests.svg :target: https://travis-ci.org/rolando/scrapy-inline-requests

.. image:: https://codecov.io/github/rolando/scrapy-inline-requests/coverage.svg?branch=master :alt: Coverage Status :target: https://codecov.io/github/rolando/scrapy-inline-requests

.. image:: https://landscape.io/github/rolando/scrapy-inline-requests/master/landscape.svg?style=flat :target: https://landscape.io/github/rolando/scrapy-inline-requests/master :alt: Code Quality Status

.. image:: https://requires.io/github/rolando/scrapy-inline-requests/requirements.svg?branch=master :alt: Requirements Status :target: https://requires.io/github/rolando/scrapy-inline-requests/requirements/?branch=master

A decorator for writing coroutine-like spider callbacks.

Quickstart

The spider below shows a simple use case of scraping a page and following a few links:

.. code:: python

from inline_requests import inline_requests
from scrapy import Spider, Request

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_url = response.urljoin('?page=%d' % i)
            try:
                next_resp = yield Request(next_url)
                urls.append(next_resp.url)
            except Exception:
                self.logger.info("Failed request %s", i, exc_info=True)

        yield {'urls': urls}

See the examples/ directory for a more complex spider.

.. warning::

The generator resumes its execution when a request's response is processed, this means the generator won't be resume after yielding an item or a request with it's own callback.

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware_ documentation.
  • High concurrency and large responses can cause higher memory usage.
  • This decorator assumes your method have the following signature (self, response).
  • Wrapped requests may not be able to be serialized by persistent backends.
  • Unless you know what you are doing, the decorated method must be a spider method and return a generator instance.

.. _httperror middleware: http://doc.scrapy.org/en/latest/topics/spider-middleware.html#scrapy.spidermiddlewares.httperror.HttpErrorMiddleware