This repository contains the code for various utilities to get structured data from information published by Yancey County.
To use all of these tools, you should create a virtual environment, then run pip install -r requirements.txt
This is a utility to generate json
data from a PDF file of the "tax scroll." An example link (working as of August 18, 2024) is here on the Yancey County website.
Note that, due to the nature of PDF scraping, this utility doesn't quite get everything right, and will take a long time to initialize when operating on full tax scroll. It helpfully prints out messages to STDERR
when certain fields are blank (which aren't always indicative of errors, but may be), but the data may still need additional verification.
Some fields end up being duplicated with a \n
in between them. This can be fixed up by running a regular expression find and replace on the generated output, transforming all occurrences of (.+)\\n\1
to $1
(or whatever the equivalent is in your editor).
Given a JSON file generated by parse_tax_scroll.py
, create a txt file that just contains one parcel id per line
This is a Scrapy spider to crawl the contents of the Yancey County Assessor's Website, to get historical tax bills. It requires a list of parcel IDs in a text file, one parcel id per line. It will generate a CSV file.
Start the spider by running scrapy crawl webtaxpay