slyrz / warc
Read and write WARC files in Go
☆45Updated 7 years ago
Alternatives and similar repositories for warc:
Users that are interested in warc are comparing it to the libraries listed below
- golang readers for ARC and WARC webarchive formats☆20Updated 2 years ago
- A golang library to work with WARC files from the common crawl☆14Updated 7 years ago
- Package mbox parses the mbox file format into messages and formats messages into mbox files☆72Updated 3 weeks ago
- CLD2 (Compact Language Detector 2) bindings for Go (golang)☆38Updated 5 years ago
- A Reader/ReaderAt for Go that uses Range requests to get files over HTTP☆28Updated 2 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆24Updated 11 years ago
- An implementation of the Goose HTML Content / Article Extractor algorithm in golang☆40Updated 4 years ago
- A Go implementation of the Cassowary constraint solving algorithm.☆77Updated 4 years ago
- grobotstxt is a native Go port of Google's robots.txt parser and matcher library.☆110Updated 3 years ago
- Offline language detection☆47Updated 7 years ago
- Golang WARC (Web ARChive) Library☆30Updated 5 years ago
- The robots.txt exclusion protocol implementation for Go language☆273Updated 2 years ago
- Go XML Pull Parser☆34Updated 4 months ago
- Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.☆53Updated 8 years ago
- A Go package that implements the JusText boilerplate removal algorithm☆109Updated 2 years ago
- A generic patricia trie (also called radix tree) implemented in Go (Golang)☆28Updated 5 years ago
- mediawiki dump parser for loading up wikipedia data☆103Updated last year
- ☆31Updated 7 months ago
- High Performance Porter2 Stemmer☆46Updated 4 years ago
- A Go implementation of the readability algorithm by arc90 labs☆133Updated 2 years ago
- Chrome Automation Library using Google Chrome Remote Debugger API in Go☆85Updated 3 years ago
- Parse JPEG data into segments via code or CLI from pure Go. Read/export/write EXIF data. Read XMP and IPTC metadata.☆76Updated 2 years ago
- A library for abstracting away from the literal Go time library, for testing and time control.☆58Updated 2 years ago
- cross plateform library to have only one instance of a software (based on python's tendo)☆48Updated 4 years ago
- A simple, lightweight, embedded geocoder for Golang with city level accuracy☆73Updated 9 years ago
- Wikidata API bindings in go.☆27Updated last year
- An approximate string matching library for the Go programming language.☆177Updated 2 years ago
- This package helps to work with huge amount of data, which cannot be stored in RAM☆43Updated 2 years ago
- A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)☆120Updated 4 months ago
- Golang package to extract useful text from a HTML document☆40Updated last year