mauidude / deduperLinks
Find near-duplicate documents using minhashing implemented in Go.
☆16Updated 9 years ago
Alternatives and similar repositories for deduper
Users that are interested in deduper are comparing it to the libraries listed below
Sorting:
- Fast identification of character sequences in text or documents (multi-lingual)☆18Updated 9 years ago
- Trigram search library for Go☆69Updated 10 years ago
- Pure-Go full text indexer and search library☆93Updated 10 years ago
- Go package for loading OpenSSH keys.☆37Updated 11 years ago
- simhash storage and searching☆138Updated 8 years ago
- Read and use word2vec vectors in Go☆56Updated 6 years ago
- All-in-one text tokenizer for Go. Super-fast. Lots of features.☆13Updated 9 years ago
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 4 years ago
- A DAG (Directed Acyclic Graph) implementation of a Bayesian Network enabling Ancestral Sampling and Gibbs Sampling on Binary Discrete Var…☆16Updated 12 years ago
- A Go package that implements the JusText boilerplate removal algorithm☆109Updated 2 years ago
- A dual interface Go module for building simple web scrapers☆52Updated last year
- Mount a BoltDB (https://github.com/boltdb/bolt) database as a FUSE filesystem;☆117Updated 5 years ago
- A pure Go interface to the free MaxMind GeoIP database☆40Updated last year
- liblinear bindings for Go☆45Updated 6 years ago
- Bleve Extensions☆48Updated last year
- Go bindings for the Apache Lucy full text search library. The Apache Lucy search engine library provides full-text search for dynamic pro…☆47Updated 10 years ago
- Named Entity Recognition for golang via MITIE☆34Updated 6 years ago
- Various parsing utilities, such as IP, time, and top-level-domain, in Go☆25Updated 9 years ago
- A library to find the percentage of similarity between two given strings (can be expanded to compare every thing!).☆46Updated 12 years ago
- GoLang Library for Browser Capabilities Project☆49Updated 2 years ago
- doc2vec , word2vec, implemented by golang. word embedding representation☆41Updated 7 years ago
- agrep-like fuzzy matching, but made faster using Golang and precomputation.☆46Updated 8 years ago
- Unicode transliterator in Golang - Replaces non-ASCII characters with their ASCII approximations.☆49Updated 9 years ago
- A general purpose application which can be used to host read-only access to one or more Bleve indexes☆13Updated 8 years ago
- Text indexing related functions in Go, including tokenizer, word marking, and snippet selecting, etc.☆26Updated 9 years ago
- Path parsing for segment unmarshaling and slicing.☆47Updated 6 months ago
- xast: ast rewriter with built-in clean up.☆27Updated 7 years ago
- A Go package for n-gram based text categorization, with support for utf-8 and raw text☆73Updated 8 months ago
- web-based UI editor for bleve index mappings☆23Updated 3 months ago
- Python List and Dict for Go☆24Updated 11 years ago