GitXplorerGitXplorer
m

wikibase-dump-filter

public
98 stars
15 forks
8 issues

Commits

List of commits on branch main.
Unverified
3f89856107fc174765c5fceb1e806d7c8b717ddc

readme: extend options

mmaxlath committed 2 years ago
Verified
dd699c333b8e7538113d993a76134d430b5a2d32

Merge pull request #42 from jsteemann/jsteemann-patch-1

mmaxlath committed 2 years ago
Verified
2b191b7d95f3445f5aeaa15e8ad7ef1fa085354c

Update README.md

jjsteemann committed 2 years ago
Verified
0f2789ded4888ea4eb2db8deea8cd08c3a4a8eb9

Merge pull request #41 from Daniel-Mietchen/patch-1

mmaxlath committed 3 years ago
Verified
17b2d3bd2d031872b32e304b94bf9b6e9a154cf9

typo fix

DDaniel-Mietchen committed 3 years ago
Unverified
047f1d32a3cdf4b6426ec1c7e283e695277ac9af

5.0.7

mmaxlath committed 3 years ago

README

The README file for this repository.

wikibase-dump-filter

Filter and format a newline-delimited JSON stream of Wikibase entities.

Typically useful to create a formatted subset of a Wikibase JSON dump.

Some context: This tool was formerly known as wikidata-filter. Wikidata is an instance of Wikibase. This tool was primarly designed with Wikidata in mind, but should be usable for any Wikibase instance.

This project received a Wikimedia Project Grant.


wikibase           wikidata

License Node JavaScript Style Guide

NPM Download stats

Summary

Install

this tool requires to have NodeJs installed.

# Install globally
npm install -g wikibase-dump-filter
# Or install just to be used in the scripts of the current project
npm install wikibase-dump-filter

Changelog

See CHANGELOG.md for version info

Download dump

Wikidata dumps

Wikidata provides a bunch of database dumps, among which the desired JSON dump. As a Wikidata dump is a very laaarge file (April 2020: 75GB compressed), it is recommended to download that file first before doing operations on it, so that if anything crashes, you don't have to start the download from zero (the download time being usually the bottleneck).

wget --continue https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz
cat latest-all.json.gz | gzip -d | wikibase-dump-filter --claim P31:Q5 > humans.ndjson

Your own Wikibase instance dump

You can generate a JSON dump using the script dumpJson.php. If you are running Wikibase with wikibase-docker, you could use the following command:

cd wikibase-docker
docker-compose exec wikibase /bin/sh -c "php ./extensions/Wikibase/repo/maintenance/dumpJson.php --log /dev/null" > dump.json
cat dump.json | wikibase-dump-filter --claim P1:Q1 > entities_with_claim_P1_Q1.ndjson

How-to

This package can both be used as a command-line tool (CLI) and as a NodeJS module. Those 2 uses have their own documentation page but the options stay the same, and are documented in the CLI section

See Also


You may also like

inventaire banner

Do you know Inventaire? It's a web app to share books with your friends, built on top of Wikidata! And its libre software too.

License

MIT