s3curitybug / similarity-uniform-fuzzy-hashLinks
Similarity algorithm (computes the similarity between two files as a 0 to 1 score) with linear complexity, based on context triggered piecewise (fuzzy) hashes.
☆34Updated 8 years ago
Alternatives and similar repositories for similarity-uniform-fuzzy-hash
Users that are interested in similarity-uniform-fuzzy-hash are comparing it to the libraries listed below
Sorting:
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆73Updated last week
- Java implementation of Lempel-Ziv Jaccard Distance☆21Updated 8 years ago
- The LAW next generation crawler.☆90Updated 4 years ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆22Updated 4 years ago
- A Java library for byte pattern matching and searching☆41Updated 4 years ago
- ☆49Updated 8 years ago
- This module contains an implementation of the Nilsimsa locality-sensitive hashing algorithm in Java.☆18Updated 6 years ago
- Dataset for programming language identification.☆24Updated 2 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 3 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆47Updated 8 years ago
- Detect memory leaks in minutes without a heap dump.☆17Updated 8 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆59Updated 8 months ago
- Anomaly detection framework @ PayPal☆108Updated 6 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆81Updated 7 years ago
- Bindings to Google's Compact Language Detector 3 to JVM Based Languages☆21Updated last year
- re2 for Java☆27Updated 10 years ago
- JDBC driver for data.world☆18Updated last year
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 7 years ago
- Learning framework for program property prediction☆217Updated 4 years ago
- A Java Library to detect and mask sensitive data☆60Updated 8 years ago
- A java agent for tracing which can be configured via simple text file and instruments the code without rebuilding the project.☆49Updated last year
- SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm☆18Updated 10 years ago
- Babelfish Python client☆17Updated 6 years ago
- Algorithms for URL Classification☆19Updated 10 years ago
- Markov Chain based fraud detection system in Spark.☆13Updated 9 years ago
- Multithreaded, gzip-compatible compression and decompression, available as a platform-independent Java library and command-line utilities…☆81Updated 5 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆197Updated last week
- The offical home of searchcode-server where you can run searchcode locally. Note that master is generally unstable in the sense that it i…☆389Updated 5 months ago