GitXplorerGitXplorer
r

image_embeddings

public
156 stars
31 forks
18 issues

Commits

List of commits on branch master.
Verified
285c6d78ca147eade188b97ce2b295a18cc479d8

Bump ipython from 7.16.1 to 7.16.3 (#26)

ddependabot[bot] committed 2 years ago
Verified
e6233e78b857285c32755587f585fa251cabae17

add blogpost to readme

rrom1504 committed 4 years ago
Verified
4018224ba3af3b490ab7f4ec29e4d84ee549e4f0

Release 1.4.0

rrom1504 committed 4 years ago
Verified
f3b03a9fdd371b5bf29d47fb4e4f9d97ee17b5aa

add mention about ui in readme

rrom1504 committed 4 years ago
Verified
b36cc35055c3afddccb493e25743256a0fdb0f34

add feature to save embeddings as numpy

rrom1504 committed 4 years ago
Unverified
fbfa7a3fec6d0a3f54b88c4f4399ceca5ec32b28

add imagewang to notebook

rrom1504 committed 4 years ago

README

The README file for this repository.

image_embeddings

pypi ci

Using efficientnet to provide embeddings for retrieval. Read the blog post at https://medium.com/@rom1504/image-embeddings-ed1b194d113e

Why this repo ? Embeddings are a widely used technique that is well known in scientific circles. But it seems to be underused and not very well known for most engineers. I want to show how easy it is to represent things as embeddings, and how many application this unlocks. Checkout the demo first!

knn example

Workflow

  1. download some pictures
  2. run inference on them to get embeddings
  3. simple knn example, to understand what's the point : click on some pictures and see KNN

Simple Install

Run pip install image_embeddings

Example workflow

  1. run image_embeddings save_examples_to_folder --images_count=1000 --output_folder=tf_flower_images, this will retrieve 1000 image files from https://www.tensorflow.org/datasets/catalog/tf_flowers (but you can also pick any other dataset)
  2. produce tf records with image_embeddings write_tfrecord --image_folder=tf_flower_images --output_folder=tf_flower_tf_records --shards=10
  3. run the inference with image_embeddings run_inference --tfrecords_folder=tf_flower_tf_records --output_folder=tf_flower_embeddings
  4. run a random knn search on them image_embeddings random_search --path=tf_flower_embeddings

Optionally if you want to use the embeddings in numpy (in other languages), run image_embeddings embeddings_to_numpy --input_path=tf_flower_embeddings --output_path=tf_flower_numpy. In particular this can be used in the web demo

$ image_embeddings random_search --path=tf_flower_embeddings
image_roses_261
160.83 image_roses_261
114.36 image_roses_118
102.77 image_roses_537
92.95 image_roses_659
88.49 image_roses_197

Explore the Simple notebook for more details.

You can try it locally or try it in colab

The From scratch notebook provides an explanation on how to build this from scratch.

API

image_embeddings.downloader

Downloader from tensorflow datasets. Any other set of images could be used instead

image_embeddings.downloader.save_examples_to_folder(output_folder, images_count=1000, dataset="tf_flowers")

Save https://www.tensorflow.org/datasets/catalog/tf_flowers to folder Also works with other tf datasets

image_embeddings.inference

Create tf recors from images files, and apply inference with an efficientnet model. Other models could be used.

image_embeddings.inference.write_tfrecord(image_folder, output_folder, num_shards=100)

Write tf records from an image folders

image_embeddings.inference.run_inference(tfrecords_folder, output_folder, batch_size=1000)

Run inference on provided tf records and save to folder the embeddings

image_embeddings.knn

Convenience methods to read, build indices and apply search on them. These methods are provided as example. Use faiss directly for bigger datasets.

image_embeddings.knn.read_embeddings(path)

Run embeddings from path and return a tuple with

  • embeddings as a numpy matrix
  • an id to name dictionary
  • a name to id dictionary

image_embeddings.knn.build_index(emb)

Build a simple faiss inner product index using the provided matrix of embeddings

image_embeddings.knn.search(index, id_to_name, emb, k=5)

Search the query embeddings and return an array of (distance, name) images

image_embeddings.knn.display_picture(image_path, image_name)

Display one picture from the given path and image name in jupyter

image_embeddings.knn.display_results(image_path, results)

Display the results from search method

image_embeddings.knn.random_search(path)

Load the embeddings, apply a random search on them and display the result

image_embeddings.knn.embeddings_to_numpy(input_path, output_folder)

Load the embeddings from the input folder as parquet and save them as

  • json for the id -> name mapping
  • numpy for the embeddings

Particularly useful to read the embeddings from other languages

Advanced Installation

Prerequisites

Make sure you use python>=3.6 and an up-to-date version of pip and setuptools

python --version
pip install -U pip setuptools

It is recommended to install image_embeddings in a new virtual environment. For example

python3 -m venv image_embeddings_env
source image_embeddings_env/bin/activate
pip install -U pip setuptools
pip install image_embeddings

Using Pip

pip install image_embeddings

From Source

First, clone the image_embeddings repo on your local machine with

git clone https://github.com/rom1504/image_embeddings.git
cd image_embeddings
make install

To install development tools and test requirements, run

make install-dev

Test

To run unit tests in your current environment, run

make test

To run lint + unit tests in a fresh virtual environment, run

make venv-lint-test

Lint

To run black --check:

make lint

To auto-format the code using black

make black

Tasks