berkmancenter / corpusbuilder
Corpus Build OCR platform
☆8Updated 2 years ago
Alternatives and similar repositories for corpusbuilder:
Users that are interested in corpusbuilder are comparing it to the libraries listed below
- Python tools for text☆15Updated 4 years ago
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆24Updated 2 years ago
- Visual analytics application for qualitative text analysis☆24Updated 2 years ago
- A deep learning architecture for reference mining from literature in the arts and humanities.☆15Updated 5 years ago
- Service for creating Twitter datasets for research and archiving.☆26Updated 2 years ago
- A browser extension providing Open Access bibliographical services☆14Updated 2 years ago
- ☆12Updated 5 years ago
- Scripts to take hand washing related text in (almost) any language and float it into a hand washing poster.☆9Updated 3 years ago
- Open Access PDF harvester☆35Updated 8 months ago
- Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).☆14Updated 5 years ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 2 years ago
- Compare accuracies of udpipe models and spacy models which can be used for NLP annotation☆14Updated 6 years ago
- Python API for KB data-services☆18Updated 4 years ago
- IWAAN - An interactive Jupyter Notebook collection that allows to run analyses of Wikipedia article editing dynamics out-of-the-box on Bi…☆9Updated 8 months ago
- Extract structured data online☆12Updated last year
- ☆17Updated this week
- Wrapper around pixel classifier☆9Updated 2 years ago
- R tools to download, ingest, and analyze the Phoenix dataset from the Open Event Data Alliance☆12Updated 8 years ago
- An alternative approach for probabilistic topic modeling based on agglomerative clustering of topics (not documents)☆12Updated 3 years ago
- A Python toolkit to generate a tokenized dump of Wikipedia for NLP☆11Updated 8 months ago
- Crawling and analyzing data on Wikipedia☆16Updated 10 months ago
- TopicScan: Visualization and validation interface for NMF Topic Modeling☆23Updated 4 years ago
- Topic Modeling Workflow in Python☆16Updated last year
- ☆12Updated 9 months ago
- Various functions to make bag-of-words approaches to text analysis more user-friendly☆24Updated 7 years ago
- A search engine built on the Unpaywall database☆18Updated 10 months ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Process, enhance and evaluate multiple OCR output.☆22Updated 2 months ago
- wrapper for the crossref events api☆17Updated last year