DH-Box / corpus-downloaderView external linksLinks
A command-line program to download text corpora.
☆34Aug 12, 2017Updated 8 years ago
Alternatives and similar repositories for corpus-downloader
Users that are interested in corpus-downloader are comparing it to the libraries listed below
Sorting:
- Scripts for scraping metadata from Project Gutenberg books, via GITenberg.☆19Sep 11, 2018Updated 7 years ago
- ENGL 87400 - Text Transformations (Graduate Center, CUNY - Spring 2015)☆12Mar 30, 2015Updated 10 years ago
- A structured list of text corpora, created for use with a corpus downloader.☆13Aug 27, 2016Updated 9 years ago
- Work-in-progress list of funding opportunities for the digital humanities☆14Jan 15, 2016Updated 10 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Feb 27, 2024Updated last year
- Python implementation of the Zeta score for contrastive text analysis☆14Jun 16, 2021Updated 4 years ago
- DBpedia Neural Question Answering Dataset☆18Jun 28, 2020Updated 5 years ago
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Mar 6, 2018Updated 7 years ago
- Bayesian nonparametric models for python☆18Sep 11, 2018Updated 7 years ago
- spaCy-to-naf converter☆21Jun 10, 2025Updated 8 months ago
- Text-Induced Corpus Clean-up☆20Jun 20, 2023Updated 2 years ago
- Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models☆50Jul 13, 2017Updated 8 years ago
- Ubiflux Vigor ventilation system RS485 Modbus communications with Python☆11Jan 28, 2026Updated 2 weeks ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆40Jun 18, 2019Updated 6 years ago
- InfiniteUlysses.com repo as it was when I finished the related Ph.D. project. See instead github.com/amandavisconti/infinite-ulysses-publ…☆26Mar 15, 2022Updated 3 years ago
- A fast, simple, multilingual tokenizer☆29May 24, 2017Updated 8 years ago
- Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.☆31Nov 3, 2017Updated 8 years ago
- spaCy match and replace, maintaining conjugation☆36Dec 9, 2022Updated 3 years ago
- Lightweight, multilingual natural language processing☆63Apr 8, 2013Updated 12 years ago
- A set of base classes in order to perfom training scripts for Neural Networs ( by means of SNNS) and SVM ( by means of SVM Light and SVM …☆14Jun 24, 2011Updated 14 years ago
- ☆10Jun 16, 2017Updated 8 years ago
- Hungarian tokenizer.☆14Mar 15, 2022Updated 3 years ago
- Deploy a Ceramic daemon to AWS☆13Apr 18, 2023Updated 2 years ago
- An online comic maker built by the State Library of Queensland for the international Fun Palaces event. Concept by Matt Finch, based on "…☆10Jan 27, 2017Updated 9 years ago
- ☆10Jul 2, 2019Updated 6 years ago
- Simple CORPORA list crawler☆10Dec 2, 2016Updated 9 years ago
- Supreme Court prediction model, "version" 2☆50Apr 24, 2017Updated 8 years ago
- Home Assistant custom component for Pollen Information in Hungary☆15Jul 17, 2024Updated last year
- Simple interface to libmagic for Go Programming Language☆13Jan 10, 2021Updated 5 years ago
- A practical introduction to Docker for data science☆10May 13, 2019Updated 6 years ago
- This project is the implementation of Li-Roth paper "Learning Question Classifiers" on TREC dataset☆12Mar 7, 2017Updated 8 years ago
- USAAR participation in SemEval2015☆11Dec 21, 2022Updated 3 years ago
- Play multimedia files from org-mode☆12Aug 20, 2018Updated 7 years ago
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Dec 18, 2020Updated 5 years ago
- ☆10Jul 25, 2016Updated 9 years ago
- Single server/laptop grade file-observatory☆10Mar 30, 2023Updated 2 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Natural Language Processing tools☆12Jan 26, 2017Updated 9 years ago
- Martini middleware/handler for serving static files from binary data☆30May 17, 2014Updated 11 years ago