GitXplorerGitXplorer
K

sdset

public
45 stars
3 forks
2 issues

Commits

List of commits on branch master.
Verified
e86d44b195061834acf5d40b0d947262237b9edc

Merge pull request #23 from tzcnt/master

ttzcnt committed 2 years ago
Verified
83bc9d253749d84550b1f4ea4f8f2b344e1e136e

Implement IntoIterator for all set operations (#2)

ttzcnt committed 2 years ago
Verified
05623c49cfafe2d80fba314211be268b5b417a67

Merge pull request #22 from tzcnt/master

ttzcnt committed 2 years ago
Unverified
bb430a2b987b59fa6bf40457cebb70e092a13e1e

additional performance enhancement

ttzcnt committed 2 years ago
Verified
bf73ba38c6d15a045cd6c84850a860eb49229a48

Merge pull request #21 from tzcnt/merge_ready

ttzcnt committed 2 years ago
Unverified
f17671f3d83009e19ac94f782a16b75078efd4ce

inline(always) on internal search funcs

ttzcnt committed 2 years ago

README

The README file for this repository.

SdSet

SdSet crate SdSet documentation

Set theory applied on sorted and deduplicated slices. Much performance! Such Wow!

API Documentation can be found on docs.rs.

sdset stands for sorted-deduplicated-slices-set which is a little bit too long.

Performances

Note about the tests, which are done on ranges of integer, if it ends with:

  • two_slices_big, the first slice contains 0..100 and the second has 1..101
  • two_slices_big2, the first contains 0..100 and the second has 51..151
  • two_slices_big3, the first contains 0..100 and the second has 100..200
  • three_slices_big, the first contains 0..100, the second has 1..101 and the third has 2..102
  • three_slices_big2, the first contains 0..100, the second has 34..134 and the third has 67..167
  • three_slices_big3, the first contains 0..100, the second has 100..200 and the third has 200..300

These slices of runs of integer are useful when they overlap, we can see how performances changes when different parts of the slices overlaps.

For more informations on "Why is there no really big slices benchmarks ?", you can see my response on /r/rust.

To run the benchmarks you must enable the unstable feature.

$ cargo bench --features unstable

Note that the sdset set operations does not need many allocations so it starts with a serious advantage. For more information you can see the benchmarks variance.

_btree are benchmarks that uses two or three BTreeSets which contains runs of integers (see above), the BTreeSets creations are not taken into account. The set operations are done on these sets and the result is accumulated in a final Vec.

_fnv are benchmarks that uses two or three HashSets which contains runs of integers (see above), it uses a custom Hasher named fnv that is specialized for little values like integers, the HashSets creations are not taken into account. The set operations are done on these sets and the result is accumulated in a final Vec.

The _vec benchmarks are available for the union set operation only, it consist of a Vec which is populated with the elements of two or three slices (see above), sorted and deduplicated.

The duo and multi measurements are the implementations that are part of this crate, the first one can only do set operations on two sets and the second one can be used for any given number of sets.

Histograms

Histograms can be generated using the benchmarks by executing the following command:

$ export CARGO_BENCH_CMD='cargo bench --features unstable'
$ ./gen_graphs.sh xxx.bench

This is much more easier to read statistics and to see how sdset is more performant on already sorted and deduplicated slices than any other kind of collection.

difference benchmarks

intersection benchmarks

union benchmarks

symmetric difference benchmarks