slyrz / warc
Read and write WARC files in Go
☆41Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for warc
- A golang library to work with WARC files from the common crawl☆14Updated 6 years ago
- golang readers for ARC and WARC webarchive formats☆20Updated last year
- Package mbox parses the mbox file format into messages and formats messages into mbox files☆69Updated 2 years ago
- Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.☆53Updated 7 years ago
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 3 years ago
- grobotstxt is a native Go port of Google's robots.txt parser and matcher library.☆108Updated 2 years ago
- Golang WARC (Web ARChive) Library☆28Updated 5 years ago
- A native golang implementation of cdb (http://cr.yp.to/cdb.html)☆65Updated 5 years ago
- CLD2 (Compact Language Detector 2) bindings for Go (golang)☆38Updated 5 years ago
- An approximate string matching library for the Go programming language.☆178Updated 2 years ago
- Levenshtein Distance in Go☆39Updated 6 years ago
- Offline language detection☆47Updated 7 years ago
- Latent Dirichlet Allocation☆30Updated 2 years ago
- A generic patricia trie (also called radix tree) implemented in Go (Golang)☆28Updated 5 years ago
- Replication for Boltdb databases.☆28Updated 7 years ago
- Run headless chromium using golang☆31Updated 7 years ago
- simhash storage and searching☆138Updated 7 years ago
- Serve millions of JSON documents via HTTP.☆66Updated 2 weeks ago
- A Go package that implements the JusText boilerplate removal algorithm☆102Updated 2 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆22Updated 10 years ago
- A pure Go implementation of the smaz compression library for short strings.☆20Updated 8 years ago
- Golang package to extract useful text from a HTML document☆39Updated last year
- Pure Go implementation of cryptographic APIs found in libsodium☆45Updated 4 years ago
- A Go implementation of the readability algorithm by arc90 labs☆132Updated 2 years ago
- Golang: Matroska and WebM Format☆37Updated last year
- Package valuegraph produces a graph representation of any Go value.☆32Updated 6 years ago
- Text summarizer for golang using LexRank☆126Updated 7 months ago
- BottomK minwise hashing for streaming set similarity☆42Updated 5 years ago
- Go XML Pull Parser☆31Updated 8 months ago
- Generate Man pages from Go source☆159Updated 9 years ago