This repository shows how to use binary and scalar quantization to reduce cost by 32x and improve latency by 32x along with retaining 97% of the original performance.
Adopted from - https://huggingface.co/blog/embedding-quantization
To run the quantization pipeline:
```
python caching.py
```