GitXplorerGitXplorer
t

datasets

public
4341 stars
1559 forks
701 issues

Commits

List of commits on branch master.
Unverified
4baec15379dd809b00d2f1aa5d988c9cffd80dc0

Create a new dataset object for each _generate_example call.

aa-googler committed 21 hours ago
Unverified
855b1cd9084b1cd325437ba2db4d95c9aac341d6

Merge pull request #10963 from alexhartl:master

aa-googler committed 3 days ago
Unverified
1322866308755ea234bae6de3c9255b8c1739206

Use f-string in Croissant utils ValueError.

aa-googler committed 4 days ago
Unverified
dd96a074ed4729cff1a0904c0f54b76eef5776e5

Fix zarr version to <3.0.0 to avoid unittests to fail.

aa-googler committed 4 days ago
Unverified
bc28396b1d459840a0d8dab64cbd4ebb3919d09a

Do not publish nightly releases on Github.

ffineguy committed 9 days ago
Unverified
03d5af407cf26b3eadee4c0e1cdadc1d43647670

Automated metadata update.

aa-googler committed 9 days ago

README

The README file for this repository.

TensorFlow Datasets

TensorFlow Datasets provides many public datasets as tf.data.Datasets.

Unittests PyPI version Python 3.10+ Tutorial API Catalog

Documentation

To install and use TFDS, we strongly encourage to start with our getting started guide. Try it interactively in a Colab notebook.

Our documentation contains:

# !pip install tensorflow-datasets
import tensorflow_datasets as tfds
import tensorflow as tf

# Construct a tf.data.Dataset
ds = tfds.load('mnist', split='train', as_supervised=True, shuffle_files=True)

# Build your input pipeline
ds = ds.shuffle(1000).batch(128).prefetch(10).take(5)
for image, label in ds:
  pass

TFDS core values

TFDS has been built with these principles in mind:

  • Simplicity: Standard use-cases should work out-of-the box
  • Performance: TFDS follows best practices and can achieve state-of-the-art speed
  • Determinism/reproducibility: All users get the same examples in the same order
  • Customisability: Advanced users can have fine-grained control

If those use cases are not satisfied, please send us feedback.

Want a certain dataset?

Adding a dataset is really straightforward by following our guide.

Request a dataset by opening a Dataset request GitHub issue.

And vote on the current set of requests by adding a thumbs-up reaction to the issue.

Citation

Please include the following citation when using tensorflow-datasets for a paper, in addition to any citation specific to the used datasets.

@misc{TFDS,
  title = {{TensorFlow Datasets}, A collection of ready-to-use datasets},
  howpublished = {\url{https://www.tensorflow.org/datasets}},
}

Disclaimers

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

If you're interested in learning more about responsible AI practices, including fairness, please see Google AI's Responsible AI Practices.

tensorflow/datasets is Apache 2.0 licensed. See the LICENSE file.