optimyze / simple_simhash

A pure ANSI-C implementation of calculating a SimHash over 4-byte tuples (including multiplicities) for a given byte stream. Simple and reasonably fast, no dynamic memory allocations (outside of some stack usage). Uses a counting bloom filter to count multiplicities while keeping memory consumption constant.
45Updated 5 years ago

Alternatives and similar repositories for simple_simhash:

Users that are interested in simple_simhash are comparing it to the libraries listed below