GitXplorerGitXplorer
k

simsity

public
142 stars
7 forks
3 issues

Commits

List of commits on branch main.
Unverified
5c107d15297085d255a1ddf169253fc58999643d

version

kkoaning committed 2 years ago
Verified
bd943b60b446766f1ae1c6ce35899dcb3ba420d0

Merge pull request #41 from koaning/faster-test

kkoaning committed 2 years ago
Unverified
648e78a9503aa133614962ed12725760501929ad

fix

kkoaning committed 2 years ago
Unverified
66580cc40200c0b8f5b0326ee4fc035233bd9c68

install-pyright

kkoaning committed 2 years ago
Unverified
bb572247a6dc9a967e5e6ffcb9faa59a692f1be8

actually use pytest

kkoaning committed 2 years ago
Unverified
22e67cad61eee5e82076cfb3c8b3aeb24933b16a

types

kkoaning committed 2 years ago

README

The README file for this repository.

landing

simsity

Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!


This repository contains simple tools to help in similarity retrieval scenarios by making a convenient wrapper around hnswlib. Typical usecases include early stage bulk labelling and duplication discovery.

Install

You can install simsity via pip.

python -m pip install simsity

The goal of simsity is to be minimal, to make rapid prototyping very easy and to be "just enough" for medium sized datasets. You will mainly interact with these two functions.

from simsity import create_index, load_index

As their names imply, you can use these to create an index or to load one from disk.

Quickstart

from simsity import create_index, load_index

# Let's fetch some demo data
from simsity.datasets import fetch_recipes
df_recipes = fetch_recipes()
recipes = df_recipes["text"]

# Let's use embetter for embeddings 
from embetter.text import SentenceEncoder
encoder = SentenceEncoder()

# Populate the ANN vector index and use it. 
index = create_index(recipes, encoder)
texts, dists = index.query("pork")

# You can also query using vectors
v_pork = encoder.transform(["pork"])[0]
texts, dists = index.query_vector(v_pork)

You can also provide a path and then you'll be able to store/load everything.

# Make an index with a path
index = create_index(recipes, encoder, path="demo")

# Load an index from a path
reloaded_index = load_index(path="demo", encoder=encoder)
texts, dists = reloaded_index.query("pork")

That's it! Happy hacking!