Various statistics on Python's standard library.
See stats.ipynb
for charts that show the data in various ways.
You can also run an interactive Jupyter session using Binder:
stdlib.csv
contains various details about the modules in the stdlib. The table
is built using various JSON files found in this repository (discussed below).
To tweak how various things are treated, you can edit the JSON files and
run aggregate.py
to update it accordingly.
The category_usage.csv
counts the number of projects which use a specify
module category. It also tallies all the commits the category is made up of.
The stats.ipynb
is a Jupyter notebook which contains various charts that try
to analyze the data from the CSV in various ways.
Public availability is (mostly) determined by documentation existing in
Doc/library/
.
private_modules.json
maps public modules to any private modules they depend
on. For modules that are "cheating" and using private modules directly instead
of their equivalent public API, they not listed as a dependent
(e.g. multiprocessing
directly using _weakrefset
instead of going through
weakref
).
Ignores Argument Clinic files and tests, but includes header files.
file_map.json
maps module name to relative file paths in a git clone.
required.json
lists the modules required to start Python (based on
python -v -S -c pass
).
usage.json
lists the modules used by the 4000 most downloaded projects
over the past year on PyPI.
The list of projects is listed in top-pypi-packages-365-days.json
as fetched
from Top PyPI Packages. The
projects are downloaded by
isidentical/syntax_test_suite.
categories.json
groups modules by category accoring to the
library index.
The __future__
module is specially treated and put in its own category.
commit_stats.json
tracks the oldest, newest, and SHA hashes of all the commits
made on a specific file.