ChrisCates / CommonCrawler
πΈ A simple way to extract data from Common Crawl
β33Updated 4 years ago
Related projects β
Alternatives and complementary repositories for CommonCrawler
- π± bento is an English-based automation language designed to be used by non-technical people.β32Updated 5 years ago
- runs go generate recursively on a specified path or environment variable and can filter by regexβ30Updated 7 years ago
- Summarizes textβ38Updated 9 years ago
- package lingo provides the data structures and algorithms required for natural language processingβ153Updated last year
- Natural Language Processing Toolkit in Golangβ63Updated 4 years ago
- Go package that helps write to files atomicallyβ10Updated 2 years ago
- Simple Go library for executing lots of operations spread over any number of threadsβ73Updated last year
- Advanced declarative web scrapingβ30Updated last year
- Reads zip files from io.Readerβ55Updated last week
- a tiny package that implements SMTP server for Go projectsβ106Updated 11 months ago
- web-based UI editor for bleve index mappingsβ24Updated 2 weeks ago
- Tagify produces a set of tags from a given source. Source can be either an HTML page, a Markdown document or a plain text. Supports Engliβ¦β38Updated 4 months ago
- doc2vec , word2vec, implemented by golang. word embedding representationβ41Updated 6 years ago
- An easy-to-use, lightweight embedded on-disk database built on Badger for use in your Go programs.β52Updated 4 years ago
- A generic patricia trie (also called radix tree) implemented in Go (Golang)β28Updated 5 years ago
- Simple Email Parserβ47Updated 8 years ago
- Go client for newsapi (https://newsapi.org/)β37Updated 4 years ago
- A Go package for n-gram based text categorization, with support for utf-8 and raw textβ72Updated 3 years ago
- Instagram power toolβ57Updated 5 years ago
- Read and use word2vec vectors in Goβ56Updated 6 years ago
- Redis style logger for Goβ28Updated 3 years ago
- Start Go command line apps with easeβ16Updated last year
- Go implementation of different backoff strategies useful for retrying operations and heartbeating.β85Updated 4 years ago
- Go Stanford NLP POS Tagger wrapperβ38Updated 7 years ago
- π Tiny utility Go client for HackerNews API.β17Updated 7 years ago
- A distributed forward caching proxy for Go's http.Client supporting TLSβ31Updated 6 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawlβ22Updated 10 years ago
- Structured scraper for Goβ25Updated 6 years ago
- A simple tool to collect and process quite a few web news from multiple sourcesβ34Updated 2 years ago
- Go package for abstracting local, in-memory, and remote (Google Cloud Storage/S3) filesystemsβ52Updated 6 years ago