mauidude / deduper
Find near-duplicate documents using minhashing implemented in Go.
☆16Updated 9 years ago
Alternatives and similar repositories for deduper:
Users that are interested in deduper are comparing it to the libraries listed below
- Various parsing utilities, such as IP, time, and top-level-domain, in Go☆24Updated 8 years ago
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 3 years ago
- Bleve Extensions☆47Updated 9 months ago
- Trigram search library for Go☆70Updated 10 years ago
- GoLang Library for Browser Capabilities Project☆49Updated last year
- Pure-Go full text indexer and search library☆94Updated 9 years ago
- textextract is a tiny library (87 lines of Go) that identifies where the article content is in a HTML page (as opposed to navigation, hea…☆11Updated 6 years ago
- A dual interface Go module for building simple web scrapers☆50Updated last year
- doc2vec , word2vec, implemented by golang. word embedding representation☆41Updated 6 years ago
- TLS wrapper/proxy in go!☆0Updated 8 years ago
- Go package for loading OpenSSH keys.☆37Updated 11 years ago
- Fast identification of character sequences in text or documents (multi-lingual)☆18Updated 8 years ago
- Path parsing for segment unmarshaling and slicing.☆47Updated 8 months ago
- flash text is a simple and fast keyword extract tool in go☆29Updated 5 years ago
- A general purpose application which can be used to host read-only access to one or more Bleve indexes☆13Updated 8 years ago
- Read and use word2vec vectors in Go☆56Updated 6 years ago
- Go bindings for the Apache Lucy full text search library. The Apache Lucy search engine library provides full-text search for dynamic pro…☆47Updated 10 years ago
- Strumt is a library to create prompt chain☆62Updated 2 months ago
- a tiny package that implements SMTP server for Go projects☆106Updated last year
- liblinear bindings for Go☆45Updated 6 years ago
- web-based UI editor for bleve index mappings☆24Updated last month
- A DAG (Directed Acyclic Graph) implementation of a Bayesian Network enabling Ancestral Sampling and Gibbs Sampling on Binary Discrete Var…☆16Updated 11 years ago
- P-Square Algorithm in Go☆36Updated 2 years ago
- A Go package that implements the JusText boilerplate removal algorithm☆107Updated 2 years ago
- BottomK minwise hashing for streaming set similarity☆42Updated 5 years ago
- A pure Go interface to the free MaxMind GeoIP database☆39Updated last year
- gooverssh - forwards over ssh.☆26Updated 8 years ago
- CLD2 (Compact Language Detector 2) bindings for Go (golang)☆38Updated 5 years ago
- Package httpgzip provides net/http-like primitives that use gzip compression when serving HTTP requests.☆24Updated last year