Yancey County Tax Scraping Tools

This repository contains the code for various utilities to get structured data from information published by Yancey County.

To use all of these tools, you should create a virtual environment, then run pip install -r requirements.txt

parse_tax_scroll.py

This is a utility to generate json data from a PDF file of the "tax scroll." An example link (working as of August 18, 2024) is here on the Yancey County website.

Note that, due to the nature of PDF scraping, this utility doesn't quite get everything right, and will take a long time to initialize when operating on full tax scroll. It helpfully prints out messages to STDERR when certain fields are blank (which aren't always indicative of errors, but may be), but the data may still need additional verification.

Known issues:

Some fields end up being duplicated with a \n in between them. This can be fixed up by running a regular expression find and replace on the generated output, transforming all occurrences of (.+)\\n\1 to $1 (or whatever the equivalent is in your editor).

generate_parcel_ids.py

Given a JSON file generated by parse_tax_scroll.py, create a txt file that just contains one parcel id per line

webtaxpay

This is a Scrapy spider to crawl the contents of the Yancey County Assessor's Website, to get historical tax bills. It requires a list of parcel IDs in a text file, one parcel id per line. It will generate a CSV file.

Start the spider by running scrapy crawl webtaxpay

scrape-yancey-county

Commits

Remove tax_data.cs

Update spider

Really add scrapy spider

Add generate_parcel_ids.py

Initial commit

README