mauidude / deduperLinks
Find near-duplicate documents using minhashing implemented in Go.
☆16Updated 10 years ago
Alternatives and similar repositories for deduper
Users that are interested in deduper are comparing it to the libraries listed below
Sorting:
- Fast identification of character sequences in text or documents (multi-lingual)☆18Updated 9 years ago
- Go package for loading OpenSSH keys.☆37Updated 12 years ago
- Various parsing utilities, such as IP, time, and top-level-domain, in Go☆25Updated 9 years ago
- simhash storage and searching☆138Updated 8 years ago
- GoLang Library for Browser Capabilities Project☆49Updated 2 years ago
- A parser for arbitarily-formatted dates/times.☆17Updated 8 years ago
- A simple library for loading word2vec binary model.☆12Updated 10 years ago
- Mount a BoltDB (https://github.com/boltdb/bolt) database as a FUSE filesystem;☆116Updated 5 years ago
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 4 years ago
- A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29☆88Updated 3 years ago
- Pure-Go full text indexer and search library☆94Updated 10 years ago
- A pure Go interface to the free MaxMind GeoIP database☆40Updated 2 years ago
- Trigram search library for Go☆69Updated 11 years ago
- word2vec in go lang☆71Updated 12 years ago
- shoco is a compressor for small text strings. [Not maintained].☆10Updated 6 years ago
- A dual interface Go module for building simple web scrapers☆53Updated 3 months ago
- Path parsing for segment unmarshaling and slicing.☆47Updated last year
- doc2vec , word2vec, implemented by golang. word embedding representation☆41Updated 7 years ago
- Package for concurrently walking files☆102Updated 9 years ago
- A Go package that implements the JusText boilerplate removal algorithm☆110Updated 3 years ago
- Ngram index for golang☆114Updated 9 years ago
- Pattern recognition package in Go lang.☆68Updated 12 years ago
- ☆19Updated 8 years ago
- Summarizes text☆39Updated 10 years ago
- CLD2 (Compact Language Detector 2) bindings for Go (golang)☆38Updated 6 years ago
- A Go package for n-gram based text categorization, with support for utf-8 and raw text☆73Updated last year
- Unicode transliterator in Golang - Replaces non-ASCII characters with their ASCII approximations.☆51Updated 9 years ago
- liblinear bindings for Go☆45Updated 7 years ago
- Read and use word2vec vectors in Go☆58Updated 7 years ago
- Library to extract text from HTML files☆11Updated 10 years ago