Embedding-Quantization

public

3 stars

1 forks

0 issues

Commits

List of commits on branch main.

Unverified

557193d7d9aa386c28482d6f63fb6935c33b0ace

readme

JJINO-ROHIT committed 8 months ago

Unverified

289abf195f824abe7f9441a44c9eb1cca01478a3

finishing up

JJINO-ROHIT committed 8 months ago

Verified

28265ed1b852f9e2bd2cc30ff5377705f8bc203e

Initial commit

JJINO-ROHIT committed 8 months ago

README

The README file for this repository.

Binary and Scalar Quantization for Embeddings

This repository shows how to use binary and scalar quantization to reduce cost by 32x and improve latency by 32x along with retaining 97% of the original performance.

Adopted from - https://huggingface.co/blog/embedding-quantization

Quick Start

To run the quantization pipeline:

```
python caching.py
```