GitXplorerGitXplorer
l

hyperminhash

public
9 stars
1 forks
0 issues

Commits

List of commits on branch master.
Verified
dbf80f145b08ec342fc3377ff43290aa26f09283

add similarity test (#2)

jjianshu93 committed 8 months ago
Unverified
96c1c52226f7b339f606ac7ad796621f731d1ac5

Bump gh actions/checkout

llukaslueg committed a year ago
Unverified
2ab3e2e353baf697bb739568348f154a64180b68

Bump to 0.1.3

llukaslueg committed a year ago
Unverified
3f129b705c44ccecd7b9888aded71476fca43de9

Update example-binary

llukaslueg committed a year ago
Unverified
11868f0d5f607166f9e5f6a3150b6f1be1471409

Add PartialEq, Eq, Hash to Sketch

llukaslueg committed a year ago
Unverified
018430564fb7bf3a385b263fda7d54d85527e420

Bump to 0.1.2

llukaslueg committed a year ago

README

The README file for this repository.

Hyperminhash for Rust

Crates.io Version Docs

A straight port of Hyperminhash for Rust. Very fast, constant memory-footprint cardinality approximation, including intersection and union operation.

use std::{io, io::Bufread, fs};

let reader = io::BufReader::new(fs::File::open(fname)?).lines();
let sketch = reader.collect::<io::Result<hyperminhash::Sketch>>()?;
println!("{}", sketch.cardinality());

Two files of 10,000,000 random strings each:

Operation Runtime Result
Cardinality via sort strings1.txt | uniq | wc -l 13.57 secs 9,774,970
Union via cat strings1.txt string2.txt | sort | uniq | wc -l 84.4 secs 19,122,087
Intersection via comm -12 <(sort string1.txt) <(sort strings2.txt) | wc -l 25.3 secs 428,370
Cardinality via Hyperminhash 0.69 secs 9,861,113
Cardinality via Hyperminhash (multithreaded) 0.15 secs 9,971,928
Union via Hyperminhash 1.59 secs 19,042,941
Intersection via Hyperminhash 1.52 secs 430,977