ted-transcript-crawler

A crawler to automatically download all the transcript of TED talks. This crawler was built using Scrapy based on this tutorial https://blakeboswell.github.io/2016/scrapy-tedtalk/ but have modified it to be usable with the latest version of TED Website.

To run:

Install Scrapy
Download or clone the repo
run cd ted-transcript-crawler/ted
run scrapy crawl ted_crawl

Output:

Outputs are stripped off all the html elements and contains only plaintext and whitespace. The outputs are saved in Json-line format.

ted-transcript-crawler

Commits

Update output

Update usage.

Ted transcript crawler

Initial commit

README

ted-transcript-crawler

To run:

Output: