This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory.
☆259Aug 17, 2016Updated 9 years ago
Alternatives and similar repositories for wikipedia-extractor
Users that are interested in wikipedia-extractor are comparing it to the libraries listed below
Sorting:
- A tool for extracting plain text from Wikipedia dumps☆3,971May 23, 2024Updated last year
- Simple Wikipedia plain text extractor with article link annotations and Hadoop support.☆103Mar 13, 2011Updated 14 years ago
- An experimental cryptographic virtual machine☆14Feb 15, 2017Updated 9 years ago
- FoGFaaS: Add serverless computing (faas) to ifogsim☆22Mar 30, 2025Updated 11 months ago
- Tool for extracting plain text from wikipedia data☆31Mar 14, 2016Updated 9 years ago
- Crawling and analyzing data on Wikipedia☆17Mar 8, 2024Updated last year
- Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby☆601Jan 11, 2018Updated 8 years ago
- The SRL-based Open IE extractor. A principal component of Open IE 4.0.☆19Oct 31, 2017Updated 8 years ago
- Fact Extraction from Wikipedia Text☆538Apr 15, 2016Updated 9 years ago
- A Python parser for MediaWiki wikicode☆862Jul 1, 2025Updated 8 months ago
- A different approach to the idea of a crypto currency.☆48Apr 7, 2014Updated 11 years ago
- A Utility Library for Wikipedia dumps☆33Feb 24, 2017Updated 9 years ago
- iPython-based tutorial in Noun Phrase chunking with the NLTK. Written to accompany PyCon 2015 poster presentation.☆17Apr 12, 2015Updated 10 years ago
- Question answering dataset featured in "Teaching Machines to Read and Comprehend☆1,297Apr 26, 2017Updated 8 years ago
- Extract statistics from Wikipedia Dump files.☆26Aug 2, 2021Updated 4 years ago
- Source code for the tutorial series at http://www.thoughtly.co/blog/prototype☆32Feb 27, 2015Updated 11 years ago
- A simple tool for small scale experiments using bayesian optimization☆35Aug 14, 2018Updated 7 years ago
- AMALGrAM, an English supersense tagger written in Python☆33May 31, 2017Updated 8 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Sep 30, 2016Updated 9 years ago
- Implementation of Spatial Contrasting Network in Keras.☆20Nov 2, 2016Updated 9 years ago
- 💫 Runtime performance comparison of spaCy against other NLP libraries☆20Aug 31, 2022Updated 3 years ago
- Extract (DOM tree) repetitions from a webpage☆12Jan 13, 2014Updated 12 years ago
- Speech ANDroid Apps☆20Jan 22, 2014Updated 12 years ago
- Arabic Word-Embedding (Word2vec) model training from Wikipedia articles☆11Dec 13, 2018Updated 7 years ago
- A platform for storing large semantic networks on MongoDB☆22Jun 20, 2011Updated 14 years ago
- topics Models extension for Mallet & scikit-learn☆49Mar 27, 2017Updated 8 years ago
- Automatic .gif creation from Youtube videos!☆56Dec 5, 2014Updated 11 years ago
- We have moved!☆10Mar 29, 2016Updated 9 years ago
- Automatically exported from code.google.com/p/wiki-links☆43Dec 15, 2015Updated 10 years ago
- ESA implementation using Wikiprep output☆56Oct 18, 2013Updated 12 years ago
- Support for the django-rq admin when using django-suit☆19Mar 9, 2023Updated 2 years ago
- Attempt at using LSTMs to predict semantic relatedness of sentences (a la Tai et al. in Improved Semantic Representations From Tree-Struc…☆22Nov 29, 2015Updated 10 years ago
- Repo for data surrounding fast food nutrition and ingredients☆10Nov 11, 2018Updated 7 years ago
- regex powered yank+substitute☆13Oct 23, 2017Updated 8 years ago
- Omgrofl interpreter☆16Oct 1, 2020Updated 5 years ago
- Regularized latent variable mixed membership modeling☆13Aug 12, 2013Updated 12 years ago
- A plug-in architecture for extending Siri virtual assistant☆29Mar 30, 2014Updated 11 years ago
- Quick and dirty script to parse bplists with Ruby☆13Oct 29, 2020Updated 5 years ago
- Collection of Workflows for the iOS app Workflow (http://workflow.is)☆10Feb 19, 2016Updated 10 years ago