vikasing / news-stopwords
A huge list of stopwords collected from millions of news articles
☆14Updated 7 years ago
Alternatives and similar repositories for news-stopwords:
Users that are interested in news-stopwords are comparing it to the libraries listed below
- SOLR bulk indexing utility for the command line.☆45Updated last week
- A highly concurrent Go solution to build sitemaps. Features CLI, proxy support, rate limiting, robots.txt compliant, url filtering and ma…☆22Updated 8 years ago
- Span formats.☆17Updated last week
- Benchmark tests for various graph databases.☆41Updated 6 years ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- This project allows one to run SQL queries over arbitrary JSON documents, Protocol Buffer objects.☆8Updated 7 years ago
- golang tools for Apache Solr☆27Updated 3 months ago
- A simple system for archiving and OCRing documents built for cloud-friendly search and backup.☆22Updated 4 years ago
- Fast Word Segmentation with Triangular Matrix☆79Updated 3 years ago
- Archival. Things I wrote about RDF from the mid-2000's. The validator is no longer maintained, sorry.☆109Updated 10 years ago
- ☆21Updated 6 years ago
- Create daily digests of activity to GitHub repositories☆40Updated 6 years ago
- Fabric is a simple triplestore written in Golang☆198Updated 2 years ago
- cartconvert is a package providing a set of cartography functions for the Go programming language☆31Updated 5 years ago
- Newshound: The Breaking News Email Aggregator☆88Updated 2 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆24Updated 11 years ago
- Nifty library to manage, query and store RDF triples. Make RDF great again!☆115Updated 5 years ago
- Toys for sifting through large sets of documents.☆13Updated 7 years ago
- Locality Sensitive Hashing using Golang and SQL database☆28Updated 8 years ago
- A Linked Data publishing framework in PHP. It uses Paget to dispatch URIs and build a local index from the SPARQL query result. See also…☆25Updated 10 years ago
- A spiritual sucessor to dmoz.org☆16Updated last year
- Select elements from large XML files, fast.☆54Updated last month
- A set of tools for working with JSON, CSV and Excel workbooks☆77Updated 2 months ago
- A golang library to work with WARC files from the common crawl☆14Updated 6 years ago
- Structured Data linter☆90Updated 3 months ago
- LDA-Based Topic Modelling in Javascript☆45Updated 10 years ago
- GitHub Webhooks Made Easy!☆32Updated last month
- Simple SQL finite state machine for Postgres☆63Updated 4 years ago
- AWK and Bash code to easily parse CSV files, with possibly embedded commas and quotes.☆54Updated 6 years ago
- A Go package for n-gram based text categorization, with support for utf-8 and raw text☆73Updated last month