spencermountain / dumpster-dive
roll a wikipedia dump into mongo
☆242Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for dumpster-dive
- a pretty-committed wikipedia markup parser☆779Updated 4 months ago
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆181Updated 6 years ago
- WordNet in JSON format.☆91Updated 4 years ago
- Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO☆225Updated last year
- varied english texts for modern NLP testing☆73Updated 2 years ago
- Filter and format a newline-delimited JSON stream of Wikibase entities☆97Updated last month
- 🎀 JavaScript API for spaCy with Python REST API☆193Updated last year
- NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.☆124Updated 8 months ago
- Parse And Create Web ARChive (WARC) files with node.js☆94Updated last year
- command-line tool to extract taxonomies from Wikidata☆125Updated 5 years ago
- JS utils functions to query a Wikibase instance and simplify its results☆326Updated last month
- ⚙️ [Processor] A better English POS tagger written in JavaScript☆53Updated 7 years ago
- Lexical database of any language☆175Updated 2 years ago
- spaCy REST API, wrapped in a Docker container.☆265Updated last year
- Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date☆41Updated 3 years ago
- One trick pony NLP library for extracting keywords from HTML documents☆18Updated 8 years ago
- read and edit a Wikibase instance from the command line☆227Updated this week
- Fast Double Metaphone algorithm☆87Updated 2 years ago
- University of Colorado VerbNet☆101Updated 6 months ago
- Entity linking system for Wikidata updated by your edits in real time☆252Updated last year
- FastText for Node.js☆194Updated last year
- Multilingual tokenizer that automatically tags each token with its type☆61Updated last year
- Imports WikiData JSON dumps into Neo4j in a meaningful way.☆63Updated 5 years ago
- TextRank algorithm implementation in Javascript☆40Updated 9 years ago
- creates a docker image with Virtuoso preloaded with the latest DBpedia dataset☆120Updated 3 weeks ago
- This repository contains code behind the visualization of the Wikimedia tool etytree at http://tools.wmflabs.org/etytree/☆50Updated 5 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆97Updated 6 years ago
- A Wordnet API in pure JavaScript☆108Updated last year
- DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text. Improving Efficiency and Accuracy in Mult…☆178Updated last year
- AmbiverseNLU: A Natural Language Understanding suite by Max Planck Institute for Informatics☆208Updated 11 months ago