mauidude / deduperLinks
Find near-duplicate documents using minhashing implemented in Go.
☆16Updated 9 years ago
Alternatives and similar repositories for deduper
Users that are interested in deduper are comparing it to the libraries listed below
Sorting:
- A parser for arbitarily-formatted dates/times.☆17Updated 7 years ago
- Various parsing utilities, such as IP, time, and top-level-domain, in Go☆25Updated 9 years ago
- A DAG (Directed Acyclic Graph) implementation of a Bayesian Network enabling Ancestral Sampling and Gibbs Sampling on Binary Discrete Var…☆16Updated 12 years ago
- Bleve Extensions☆46Updated last year
- simhash storage and searching☆138Updated 8 years ago
- Go package for loading OpenSSH keys.☆37Updated 12 years ago
- Trigram search library for Go☆69Updated 11 years ago
- xast: ast rewriter with built-in clean up.☆27Updated 8 years ago
- word2vec in go lang☆71Updated 12 years ago
- A pure Go interface to the free MaxMind GeoIP database☆40Updated last year
- Text indexing related functions in Go, including tokenizer, word marking, and snippet selecting, etc.☆26Updated 9 years ago
- Fast identification of character sequences in text or documents (multi-lingual)☆18Updated 9 years ago
- All-in-one text tokenizer for Go. Super-fast. Lots of features.☆13Updated 9 years ago
- Go package to detect interesting portions of images☆60Updated last year
- Mount a BoltDB (https://github.com/boltdb/bolt) database as a FUSE filesystem;☆116Updated 5 years ago
- shoco is a compressor for small text strings. [Not maintained].☆10Updated 6 years ago
- A Go package that implements the JusText boilerplate removal algorithm☆110Updated 3 years ago
- Super simple, concurrent worker queue in golang☆68Updated 6 years ago
- Easy handling of memory-mapped files☆22Updated 11 years ago
- Python List and Dict for Go☆24Updated 11 years ago
- Summarizes text☆39Updated 10 years ago
- CLD2 (Compact Language Detector 2) bindings for Go (golang)☆38Updated 6 years ago
- liblinear bindings for Go☆45Updated 7 years ago
- Path parsing for segment unmarshaling and slicing.☆47Updated 10 months ago
- S-Bitmap: Distinct Counting with a Self-Learning Bitmap☆37Updated 10 years ago
- Pure-Go full text indexer and search library☆94Updated 10 years ago
- Ngram index for golang☆114Updated 9 years ago
- Directed Acyclic Word Graph implementation in Go, with fuzzy search of words in the graph.☆31Updated 11 years ago
- GoLang Library for Browser Capabilities Project☆49Updated 2 years ago
- agrep-like fuzzy matching, but made faster using Golang and precomputation.☆46Updated 9 years ago