s3curitybug / similarity-uniform-fuzzy-hashLinks
Similarity algorithm (computes the similarity between two files as a 0 to 1 score) with linear complexity, based on context triggered piecewise (fuzzy) hashes.
☆34Updated 7 years ago
Alternatives and similar repositories for similarity-uniform-fuzzy-hash
Users that are interested in similarity-uniform-fuzzy-hash are comparing it to the libraries listed below
Sorting:
- Java implementation of Lempel-Ziv Jaccard Distance☆21Updated 8 years ago
- A Java library for byte pattern matching and searching☆41Updated 4 years ago
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- ☆40Updated 5 years ago
- The LAW next generation crawler.☆89Updated 3 years ago
- A Mixed Trie and Levenshtein distance implementation in Java for extremely fast prefix string searching and string similarity.☆44Updated 3 years ago
- An ANTLR 4 grammar for PCRE☆30Updated last year
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆71Updated last month
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆73Updated last year
- Match tens of thousands of regular expressions within milliseconds - Java bindings for Intel's hyperscan 5☆193Updated this week
- Natural language detection, Java bindings for CLD2☆14Updated this week
- Fast multi-string search☆16Updated 2 years ago
- Sensitive Data Management: Data Discovery and Anonymization toolkit☆155Updated last month
- Zero-downtime schema evolution for PostgreSQL☆66Updated 2 years ago
- ShiftLeft OverflowDB☆130Updated 4 months ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated 5 months ago
- No-nonsense, actually-working Java bindings to FUSE using JNA.☆141Updated 8 years ago
- JSuffixArrays (Suffix Arrays in Java)☆59Updated 8 years ago
- Dataset for programming language identification.☆23Updated 2 years ago
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Updated 2 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 3 years ago
- Java library for fast multiple strings matchings. Uses internally Aho-Corasick or Commentz-Walter.☆18Updated 2 years ago
- ☆18Updated 9 years ago
- Java serialization, faster and space efficient version of ObjectOutputStream☆46Updated 4 years ago
- Java port of smaz, a small string compression algorithm☆40Updated 4 years ago
- Multithreaded, gzip-compatible compression and decompression, available as a platform-independent Java library and command-line utilities…☆81Updated 5 years ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆21Updated 4 years ago
- Java filesystems as FUSE☆110Updated 2 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- A fast and comprehensive Java library capable of performing automaton and non-automaton based Levenshtein distance determination and neig…☆43Updated 12 years ago