crawlerclub / ce
Html article content extractor in Golang.
☆12Updated 2 years ago
Alternatives and similar repositories for ce:
Users that are interested in ce are comparing it to the libraries listed below
- Unsupervised Word Discovery☆10Updated 5 years ago
- Go-flashtext is a flashtext implement written in Go (Golang). It is based on the FlashText algorithm.☆19Updated 3 years ago
- Read and use word2vec vectors in Go☆56Updated 6 years ago
- gRPC server for hnswlib☆14Updated last year
- Natural Language Processing Toolkit in Golang☆63Updated 4 years ago
- A declarative, SQL-like DSL for data integration tasks.☆14Updated 6 years ago
- Go implementation of today's most used tokenizers☆41Updated 4 years ago
- Go Based Lightweight RAG / LLM Tool with CLI + API☆11Updated last year
- doc2vec , word2vec, implemented by golang. word embedding representation☆41Updated 6 years ago
- flash text is a simple and fast keyword extract tool in go☆29Updated 5 years ago
- A project around helping to prevent typing typos. TySug (Typo Suggestions) suggests alternative words with respect to keyboard layouts☆18Updated last year
- Bleve Extensions☆47Updated 9 months ago
- An Inverted Index generator implemented in Go used for text search in large document sets.☆18Updated 5 years ago
- Extract content from HTML by removing unwanted boilerplate text.☆9Updated 7 years ago
- A streaming ETL for fish☆13Updated 5 years ago
- Go library for accessing the Paddle API☆10Updated 2 years ago
- A general purpose application which can be used to host read-only access to one or more Bleve indexes☆13Updated 8 years ago
- ☆18Updated 3 years ago
- Genetic Algorithm and Particle Swarm Optimization☆33Updated 3 years ago
- Unofficial C binding for Onnxruntime in Golang.☆16Updated last year
- Tiny little queue on top of sqlite written in Go☆12Updated 11 months ago
- Easy handling of memory-mapped files☆22Updated 10 years ago
- Type-safe, automatic, asynchronous batch processing.☆16Updated 7 months ago
- Utilities for processing Wikipedia and Wikidata dumps in Go. Read-only mirror of https://gitlab.com/tozd/go/mediawiki☆11Updated last month
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 3 years ago
- Orbiter is a tool for collecting and redistributing webhooks over the network.☆20Updated 3 years ago
- A simple library for loading word2vec binary model.☆12Updated 9 years ago
- Package assocentity returns the mean distance from tokens to an entity and its synonyms☆15Updated last year
- Summarizes text☆38Updated 9 years ago
- Go implementation of simhash algoritim☆41Updated 7 years ago