spencermountain / dumpster-diveLinks
roll a wikipedia dump into mongo
☆249Updated last year
Alternatives and similar repositories for dumpster-dive
Users that are interested in dumpster-dive are comparing it to the libraries listed below
Sorting:
- a pretty-committed wikipedia markup parser☆843Updated 4 months ago
- 🎀 JavaScript API for spaCy with Python REST API☆198Updated 2 years ago
- varied english texts for modern NLP testing☆78Updated 3 years ago
- WordNet in JSON format.☆95Updated 5 years ago
- FastText for Node.js☆198Updated 2 years ago
- command-line tool to extract taxonomies from Wikidata☆129Updated 6 years ago
- plugin to extract keywords and key-phrases☆337Updated last year
- spaCy REST API, wrapped in a Docker container.☆267Updated 2 years ago
- NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.☆133Updated last year
- JS utils functions to query a Wikibase instance and simplify its results☆341Updated 3 weeks ago
- displaCy.js: An open-source NLP visualiser for the modern web☆345Updated 7 years ago
- Multilingual tokenizer that automatically tags each token with its type☆63Updated 2 years ago
- Filter and format a newline-delimited JSON stream of Wikibase entities☆104Updated 3 months ago
- TextRank algorithm implementation in Javascript☆40Updated 10 years ago
- This project represents the 300-dimensional word vectors from word2vec as JSON.☆129Updated 9 years ago
- One trick pony NLP library for extracting keywords from HTML documents☆18Updated 9 years ago
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆205Updated 7 years ago
- A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.☆98Updated 3 years ago
- Text summarization using Lexrank☆54Updated 7 years ago
- Outputs a list of ranked DBpedia resources for a search string.☆187Updated 4 years ago
- LanguageCrunch NLP server docker image☆285Updated 3 years ago
- Word embeddings for the web☆28Updated 2 years ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆189Updated 2 weeks ago
- English lexicon useful in NLP/NLU☆16Updated 2 years ago
- displaCy-ent.js: An open-source named entity visualiser for the modern web☆199Updated 7 years ago
- Index Common Crawl archives in tabular format☆124Updated last week
- Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date☆41Updated 4 years ago
- Tool for exploring Word Vector models☆180Updated 7 years ago
- tool for collectively summarizing large discussions☆145Updated 3 years ago
- A temporal ordering system for events and time expressions in written text.☆42Updated 3 years ago