GitXplorerGitXplorer
j

scrape-yancey-county

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
b79e5a6cd06785dc4ac0cb6e499b6f6eea88a7dc

Remove tax_data.cs

jjmaroeder committed 5 months ago
Unverified
9b67778a77f4058bf96ecba3e693c39c75f05fcc

Update spider

jjmaroeder committed 5 months ago
Unverified
93b4525702ed5b42d04ada2a12b9c62c61c378c1

Really add scrapy spider

jjmaroeder committed 5 months ago
Unverified
220a6f38c17f85c90e3b2d1deab81dee6face344

Add generate_parcel_ids.py

jjmaroeder committed 5 months ago
Unverified
3e1c9757ac7420a2a1329347faa94cf50c93441f

Initial commit

jjmaroeder committed 5 months ago

README

The README file for this repository.

Yancey County Tax Scraping Tools

This repository contains the code for various utilities to get structured data from information published by Yancey County.

To use all of these tools, you should create a virtual environment, then run pip install -r requirements.txt

parse_tax_scroll.py

This is a utility to generate json data from a PDF file of the "tax scroll." An example link (working as of August 18, 2024) is here on the Yancey County website.

Note that, due to the nature of PDF scraping, this utility doesn't quite get everything right, and will take a long time to initialize when operating on full tax scroll. It helpfully prints out messages to STDERR when certain fields are blank (which aren't always indicative of errors, but may be), but the data may still need additional verification.

Known issues:

Some fields end up being duplicated with a \n in between them. This can be fixed up by running a regular expression find and replace on the generated output, transforming all occurrences of (.+)\\n\1 to $1 (or whatever the equivalent is in your editor).

generate_parcel_ids.py

Given a JSON file generated by parse_tax_scroll.py, create a txt file that just contains one parcel id per line

webtaxpay

This is a Scrapy spider to crawl the contents of the Yancey County Assessor's Website, to get historical tax bills. It requires a list of parcel IDs in a text file, one parcel id per line. It will generate a CSV file.

Start the spider by running scrapy crawl webtaxpay