GitXplorerGitXplorer
m

wikibase-dump-filter

public
98 stars
15 forks
8 issues

Commits

List of commits on branch main.
Unverified
0b51f297f45f95727b008173a109c89c02180713

6.0.0

mmaxlath committed 3 months ago
Unverified
e9d37e11a6379e8c149097378531582acc55a1c0

changelog: update

mmaxlath committed 3 months ago
Unverified
4b68bd05de75f8c28ea296c2e8e9c53975fa1a10

move test folder to tests

mmaxlath committed 3 months ago
Unverified
4f93324be45bab681ce2898b593ef37369a364d1

dependencies: replace git-hooks with @vercel/git-hooks

mmaxlath committed 3 months ago
Unverified
99eafb637cc670311e38048e315d550ae9e6b02e

convert project to ESM

mmaxlath committed 3 months ago
Unverified
600f9e4c7af88d69f923a4c2949c565cef3628b9

+ .github/FUNDING.yml

mmaxlath committed 6 months ago

README

The README file for this repository.

wikibase-dump-filter

Filter and format a newline-delimited JSON stream of Wikibase entities.

Typically useful to create a formatted subset of a Wikibase JSON dump.

Some context: This tool was formerly known as wikidata-filter. Wikidata is an instance of Wikibase. This tool was primarly designed with Wikidata in mind, but should be usable for any Wikibase instance.

This project received a Wikimedia Project Grant.


wikibase           wikidata

License Node JavaScript Style Guide

NPM Download stats

Summary

Install

this tool requires to have NodeJs installed.

# Install globally
npm install -g wikibase-dump-filter
# Or install just to be used in the scripts of the current project
npm install wikibase-dump-filter

Changelog

See CHANGELOG.md for version info

Download dump

Wikidata dumps

Wikidata provides a bunch of database dumps, among which the desired JSON dump. As a Wikidata dump is a very laaarge file (April 2020: 75GB compressed), it is recommended to download that file first before doing operations on it, so that if anything crashes, you don't have to start the download from zero (the download time being usually the bottleneck).

wget --continue https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz
cat latest-all.json.gz | gzip -d | wikibase-dump-filter --claim P31:Q5 > humans.ndjson

Your own Wikibase instance dump

You can generate a JSON dump using the script dumpJson.php. If you are running Wikibase with wikibase-docker, you could use the following command:

cd wikibase-docker
docker-compose exec wikibase /bin/sh -c "php ./extensions/Wikibase/repo/maintenance/dumpJson.php --log /dev/null" > dump.json
cat dump.json | wikibase-dump-filter --claim P1:Q1 > entities_with_claim_P1_Q1.ndjson

How-to

This package can both be used as a command-line tool (CLI) and as a NodeJS module. Those 2 uses have their own documentation page but the options stay the same, and are documented in the CLI section

See Also


You may also like

inventaire banner

Do you know Inventaire? It's a web app to share books with your friends, built on top of Wikidata! And its libre software too.

License

MIT