spencermountain / dumpster-diveLinks

roll a wikipedia dump into mongo

☆245

Alternatives and similar repositories for dumpster-dive

Users that are interested in dumpster-dive are comparing it to the libraries listed below

Sorting:

spencermountain / wtf_wikipedia
a pretty-committed wikipedia markup parser
☆819Updated 3 weeks ago
ines / spacy-js
🎀 JavaScript API for spaCy with Python REST API
☆198Updated last year
kengz / spacy-nlp
Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO
☆226Updated 2 years ago
loretoparisi / fasttext.js
FastText for Node.js
☆195Updated 2 years ago
FinNLP / en-pos
⚙️ [Processor] A better English POS tagger written in JavaScript
☆54Updated 8 years ago
nlp-compromise / nlp-corpus
varied english texts for modern NLP testing
☆77Updated 3 years ago
winkjs / wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
☆131Updated last year
jgontrum / spacy-api-docker
spaCy REST API, wrapped in a Docker container.
☆267Updated 2 years ago
mb-14 / embeddings.js
Word embeddings for the web
☆28Updated 2 years ago
maxlath / wikibase-sdk
JS utils functions to query a Wikibase instance and simplify its results
☆335Updated 3 months ago
fluhus / wordnet-to-json
WordNet in JSON format.
☆91Updated 4 years ago
maxlath / wikibase-dump-filter
Filter and format a newline-delimited JSON stream of Wikibase entities
☆98Updated last month
dpressel / textrank-js
TextRank algorithm implementation in Javascript
☆41Updated 10 years ago
nlp-compromise / wordnet.js
an opinionated assembly of wordnet for javascript
☆55Updated 8 years ago
ikreymer / cdx-index-client
A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/
☆195Updated 6 years ago
explosion / displacy
displaCy.js: An open-source NLP visualiser for the modern web
☆345Updated 7 years ago
explosion / displacy-ent
displaCy-ent.js: An open-source named entity visualiser for the modern web
☆198Updated 7 years ago
nichtich / wikidata-taxonomy
command-line tool to extract taxonomies from Wikidata
☆128Updated 6 years ago
turbomaze / word2vecjson
This project represents the 300-dimensional word vectors from word2vec as JSON.
☆127Updated 8 years ago
tcrossland / spacy-annotator
☆13Updated 8 years ago
oterrier / gracyql
A thin GraphQL wrapper around spacy
☆21Updated 5 years ago
inventaire / entities-search-engine
Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date
☆41Updated 4 years ago
commoncrawl / cc-index-table
Index Common Crawl archives in tabular format
☆123Updated last week
winkjs / wink-lexicon
English lexicon useful in NLP/NLU
☆15Updated 2 years ago
26medias / context-aware-markov-chains
Markov Chain combined with word vector embedding (word2vec) and part-of-speech tagging, for context-aware text generation. License: MIT
☆99Updated 8 years ago
CreativeCodingLab / TextAnnotationGraphs
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
☆96Updated 3 years ago
dahlia / wikidata
Wikidata client library for Python
☆356Updated last year
commoncrawl / cc-mrjob
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
☆167Updated 3 years ago
kermitt2 / entity-fishing
A machine learning tool for fishing entities
☆263Updated 2 months ago
artpar / languagecrunch
LanguageCrunch NLP server docker image
☆285Updated 2 years ago