A crawler to automatically download all the transcript of TED talks. This crawler was built using Scrapy based on this tutorial https://blakeboswell.github.io/2016/scrapy-tedtalk/ but have modified it to be usable with the latest version of TED Website.
- Install Scrapy
- Download or clone the repo
- run
cd ted-transcript-crawler/ted
- run
scrapy crawl ted_crawl
Outputs are stripped off all the html elements and contains only plaintext and whitespace. The outputs are saved in Json-line format.