dbamman / book-nlp
Natural language processing pipeline for book-length documents (archival Java version; for current Python version, see: https://github.com/booknlp/booknlp)
☆311Updated 3 years ago
Alternatives and similar repositories for book-nlp:
Users that are interested in book-nlp are comparing it to the libraries listed below
- Collection of tools for building diachronic/historical word vectors☆423Updated last year
- Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.☆346Updated 2 years ago
- A command-line program to download text corpora.☆34Updated 7 years ago
- A point-and-click tool for creating and analyzing topic models produced by MALLET.☆107Updated 3 years ago
- Retrofitting Word Vectors to Semantic Lexicons☆375Updated 5 years ago
- Various utilities for processing the data.☆207Updated this week
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆29Updated 2 months ago
- English data☆205Updated this week
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆126Updated 3 years ago
- Software and resources for natural language processing.☆131Updated 8 years ago
- PredPatt: Predicate-Argument Extraction from Universal Dependencies☆111Updated 3 years ago
- Tutorial on computational models of language change☆114Updated 5 years ago
- A toolkit for corpus linguistics☆200Updated 5 years ago
- Sample implementation of a politeness model, trained on the Stanford Politeness Corpus☆148Updated 2 years ago
- ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.☆55Updated 2 months ago
- System for building, visualizing, and working with LDA topic models☆93Updated this week
- A Python wrapper around the topic modeling functions of MALLET.☆101Updated 3 months ago
- Automatically exported from code.google.com/p/universal-pos-tags☆129Updated 2 years ago
- Digital Humanities Across Borders☆47Updated 11 months ago
- Take a MALLET to disciplinary history☆99Updated 2 years ago
- Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies☆70Updated 5 years ago
- Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models☆51Updated 7 years ago
- An implementation of latent Dirichlet allocation in javascript☆183Updated 2 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- Socially-Equitable Language Identification☆78Updated last year
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆74Updated 2 weeks ago
- Corpus of Open Access articles from multiple fields in Science, Technology, and Medicine.☆73Updated 7 years ago
- ☆97Updated 3 years ago
- Repository for the Georgetown University Multilayer Corpus (GUM)☆91Updated this week
- Corpus of Spanish Golden-Age Sonnets (with metrical annotation) / Corpus de Sonetos del Siglo de Oro (con anotación métrica)☆35Updated 2 years ago