GitXplorerGitXplorer
w

call_scrapy_in_a_pyScript

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
538c2bed89e6595fe6860bead18d058999f7c68c

Update readme.md

wwilliamjqk committed 8 years ago
Unverified
abe4c990ffbeeb49295ddce52dd0d8bffd4d1623

explain why Scrapy dont support multithreading well

wwilliamjqk committed 8 years ago
Unverified
4983cf80926e62e446bf21b0e12d04eee55cd629

A neat version by recurring self.parse itself

wwilliamjqk committed 8 years ago
Unverified
183d72b0e60070090a880f93366f22eb60159833

fast version use yield in for index_1~xxx

wwilliamjqk committed 8 years ago
Unverified
da56c827e92cc64f9878d3fd163f3183497660b7

replace \t with ' '*4

wwilliamjqk committed 8 years ago
Unverified
28cda442a88c772bc2d64523fc7d4b16b17394b0

Supplement a remark of readme.md

wwilliamjqk committed 8 years ago

README

The README file for this repository.

Use the Scrapy library to crawl the page url

I re-wrote the reptile program with Scrapy library. Currently on the Internet search Scrapy most of the use of the command line in the operation, the scrapy as an application. I want to write the function used in a py script, so write this.

Note: On the basis of the bs4 program is different. Later in the online search, found that Scrapy does not support multi-threaded (multi-threaded is Scrapy own internal optimization, but can not be manually configured, still can run more crawlers), yield \ Request \ callback these together With the efficiency is still very high.