== Andre Web Scrapper
This project will provide the user with the ability to crawl large websites to determine if they are business sites or not by matching all the internal links with given keywords. Will also keep a list of the internals links that matched a specific keyword.
Site that are not active will be flagged as isActive = false and be reflected in the database and resulting excel file.
== Core Functionalities
- Support large files including 200k sites excel files
- Support multiple keywords to be added to an specific batch crawl
- Stop/Start/Resume Crawl process
- Realtime stats
- Multi-threading for max processing speed
== Additional Information
This is Ruby on Rails web application running on Rails 3.2.8 and Ruby 1.9.3 (both latest version at the moment of writing) database will be MySQL.
- Additional Gems Nokogiri Mechanize RSpec Devise more to be added