GitXplorerGitXplorer
e

spatial-data-cooking-show

public
16 stars
0 forks
2 issues

Commits

List of commits on branch main.
Unverified
c62f905cb513f8fba2ee2edaeeab22d0491799f6

no root

eeyeseast committed 6 months ago
Unverified
64f3fd3f56d425239b55df44a0c16e9ec6725ad8

lock

eeyeseast committed 6 months ago
Unverified
3595902cd182a554f9b20cdbb553f8db9d5723ed

Update dependencies

eeyeseast committed 6 months ago
Verified
771327526a164e01c9e21b3ea03fbd5b41770e1e

Merge pull request #3 from eyeseast/poetry

eeyeseast committed a year ago
Unverified
7f5954605a73a166e8f6f91c9716d29c356fe51f

Switch to poetry

eeyeseast committed a year ago
Verified
db9343850b359a62a42badebca58be0eb10c737d

Github no longer supports simplestyle-spec

eeyeseast committed 2 years ago

README

The README file for this repository.

Spatial data cooking show

This is a demo project and template repository showing how I use SpatiaLite with Datasette for quick spatial analysis.

Live show notes: https://docs.google.com/document/d/1z30ZCWGsR_jr6EuHKukJ4kTZQMBryrrEkIgo_5twZjo/edit

Why SQL, SQLite and Datasette

SQL is old. It's boring. It's widely adopted and not controlled by a company that might change it or shut it down next year.

SQLite is a database in a single file. It's fast and durable. It's easy, both in code and in my head, to make and delete and remake lots -- hundreds -- of little databases. They're just files.

There are a lot of shiny, brittle things out there. This is cast iron coding.

Libraries included

Run poetry install to create a virtual environment and get the latest version of everything. I also recommend installing SpatiaLite.

You might also use ...

These libraries aren't included in the default dependencies but could be useful:

What should I use this for?

Over the past year, I've started using this set of libraries to quickly explore, reshape and reformat data for visualization in news projects. My workflow usually involves three phases: loading, exploration, export.

This stack isn't (yet) a replacement for R, or GeoPandas or QGIS. Those are good tools, and if you're more comfortable there, I'm not here to talk you out of them.

This approach is, for me, faster than QGIS for most datasets. It allows me to more quickly iterate on finding interesting subsets (because I can write SQL faster than I can navigate a GUI), and it allows me to see intermediate results, because each query generates a map.

Loading data

For self-contained datasets, I use geojson-to-sqlite. (For shapefiles, I usually transform data to GeoJSON first using ogr2ogr, but shapefile-to-sqlite is also available.)

For tabular data, I use sqlite-utils.

Exploration

If you are new to Datasette, this tutorial is a good place to start.

Once you have data in SpatiaLite, every SQL query that includes a geometry column will now include a map. (If you're using leaflet-marker-cluster, you'll see a map if your query includes longitude and latitude columns.) Use queries to see different slices of the data.

Here's a map of counties in Massachusetts.

Using datasette-geojson-map lets you style map features using Mapbox's simple-style-spec.

With sqlite-colorbrewer, it's possible to color features based on data.

Here are countries in Asia, in color.

Combine those, and we can start using Datasette for exploratory visualization:

Export

Finally, let's get data out of exploration mode and into something more presentable.

This stack is built for reporting, not for storytelling.

For interactive maps, I recommend exporting a version of your data as GeoJSON and building static versions for presentation with Leaflet or Mapbox. For more advanced visualizations, look at d3-geo.

Datasette allows queries to run from the command line using the --get option, which will hit any Datasette URL and return the result. Combine this with canned queries for scripted exports. I usually put this in a Makefile:

combined.geojson: project.db
    poetry run datasette project.db --get /project/combined.geojson \
     --load-extension spatialite \
     --setting sql_time_limit_ms 5000 \
     --setting max_returned_rows 20000 \
     -m metadata.yml > $@

That assumes I've loaded whatever I need into project.db and created a canned query called combined (and then uses datasette-geojson for geojson output).

My goal in this step is to end up with a file that won't need any further filtering or processing once I pull it into a frontend tool.

Here are static versions of the examples above:

Datasets to test out