spencermountain / dumpster-diveLinks
roll a wikipedia dump into mongo
☆242Updated 11 months ago
Alternatives and similar repositories for dumpster-dive
Users that are interested in dumpster-dive are comparing it to the libraries listed below
Sorting:
- a pretty-committed wikipedia markup parser☆812Updated last week
- 🎀 JavaScript API for spaCy with Python REST API☆198Updated last year
- varied english texts for modern NLP testing☆75Updated 2 years ago
- ⚙️ [Processor] A better English POS tagger written in JavaScript☆54Updated 8 years ago
- JS utils functions to query a Wikibase instance and simplify its results☆331Updated last month
- Filter and format a newline-delimited JSON stream of Wikibase entities☆97Updated 7 months ago
- command-line tool to extract taxonomies from Wikidata☆126Updated 5 years ago
- Part-of-speech utilities for node.js based on the WordNet database.☆477Updated 2 years ago
- NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.☆130Updated last year
- WordNet in JSON format.☆91Updated 4 years ago
- Visualize Wikidata items using d3.js☆198Updated last month
- A thin GraphQL wrapper around spacy☆21Updated 4 years ago
- Parse And Create Web ARChive (WARC) files with node.js☆98Updated 4 months ago
- Imports WikiData JSON dumps into Neo4j in a meaningful way.☆64Updated 6 years ago
- A Wordnet API in pure JavaScript☆109Updated 2 years ago
- read and edit a Wikibase instance from the command line☆231Updated 2 weeks ago
- A machine learning tool for fishing entities☆264Updated 2 weeks ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump☆253Updated last year
- plugin to extract keywords and key-phrases☆333Updated 7 months ago
- Multilingual tokenizer that automatically tags each token with its type☆62Updated 2 years ago
- Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO☆225Updated 2 years ago
- This project represents the 300-dimensional word vectors from word2vec as JSON.☆125Updated 8 years ago
- Entity linking system for Wikidata updated by your edits in real time☆254Updated 6 months ago
- Wikidata client library for Python☆354Updated 10 months ago
- Outputs a list of ranked DBpedia resources for a search string.☆186Updated 3 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆165Updated 3 years ago
- Word embeddings for the web☆28Updated 2 years ago
- fasttag part of speech tagger javascript implementation☆278Updated 5 years ago
- spaCy REST API, wrapped in a Docker container.☆267Updated 2 years ago
- English lexicon useful in NLP/NLU☆15Updated 2 years ago