slyrz / warc
Read and write WARC files in Go
☆40Updated 6 years ago
Related projects: ⓘ
- golang readers for ARC and WARC webarchive formats☆20Updated last year
- A golang library to work with WARC files from the common crawl☆14Updated 6 years ago
- Package mbox parses the mbox file format into messages and formats messages into mbox files☆69Updated 2 years ago
- CLD2 (Compact Language Detector 2) bindings for Go (golang)☆37Updated 4 years ago
- Golang WARC (Web ARChive) Library☆29Updated 5 years ago
- A pure Go implementation of the smaz compression library for short strings.☆20Updated 8 years ago
- A Reader/ReaderAt for Go that uses Range requests to get files over HTTP☆27Updated 2 years ago
- Takes a full name and splits it into individual name parts☆42Updated last week
- An approximate string matching library for the Go programming language.☆178Updated last year
- mediawiki dump parser for loading up wikipedia data☆97Updated 9 months ago
- High Performance Porter2 Stemmer☆46Updated 3 years ago
- A Go package that implements the JusText boilerplate removal algorithm☆102Updated last year
- Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.☆52Updated 7 years ago
- A native golang implementation of cdb (http://cr.yp.to/cdb.html)☆65Updated 5 years ago
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 3 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆22Updated 10 years ago
- Package valuegraph produces a graph representation of any Go value.☆32Updated 6 years ago
- simhash storage and searching☆138Updated 7 years ago
- Go package to parse GEDCOM files.☆37Updated 3 weeks ago
- Serve millions of JSON documents via HTTP.☆66Updated 10 months ago
- A Go implementation of the readability algorithm by arc90 labs☆132Updated 2 years ago
- Text summarizer for golang using LexRank☆126Updated 5 months ago
- Summarizes text☆38Updated 9 years ago
- 📖⏎ An efficient and flexible word-wrapping package for Go (golang)☆16Updated 3 years ago
- Middleware for keeping track of users, login states and permissions☆88Updated last year
- ☆20Updated last year
- Tokenizers and lemmatizers for Go☆107Updated 3 months ago
- Webpage summary extractor using Facebook Open Graph and arc90's readability☆69Updated 5 years ago
- Latent Dirichlet Allocation☆30Updated 2 years ago
- Offline language detection☆46Updated 7 years ago