raduangelescu / gutenbergpyLinks
Gutenberg cache and query library
☆37Updated 10 months ago
Alternatives and similar repositories for gutenbergpy
Users that are interested in gutenbergpy are comparing it to the libraries listed below
Sorting:
- a python package for cleaning Gutenberg books and dataset☆34Updated last month
- This is a collection of sentence-level aligned Sanskrit-Tibetan Etexts.☆15Updated 2 years ago
- Poetic processing, for Python.☆40Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Interactive visualization of Wiktionary words and etymologies.☆92Updated 3 months ago
- A tool for analyzing the word histories of a text.☆34Updated 6 months ago
- JSON representation of the Zotero data model☆54Updated 4 months ago
- eXtensible Interlinear Glossed Text☆33Updated 3 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 2 months ago
- Datasette plugin for uploading CSV files and converting them to database tables☆26Updated last year
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Python parser for the Archie Markup Language (ArchieML)☆12Updated 3 years ago
- Metadata from Project Gutenberg☆41Updated 2 months ago
- Multilingual syllable annotation pipeline component for spacy☆39Updated 2 years ago
- Tool for writing and generating interactive books.☆49Updated 6 years ago
- Get the scholarly citation for any research product: software, preprint, paper, or dataset☆81Updated 2 years ago
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- ☆30Updated 8 years ago
- A maximum-strength name parser for record linkage.☆37Updated last month
- Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.☆20Updated 3 weeks ago
- Sanskrit Tibetan Parallel Dataset☆11Updated last year
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆17Updated last week
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆33Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- WordWanderer – take your text for a walk☆12Updated 6 years ago
- An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship ty…☆122Updated last year
- Scrapes some Finnish word definitions from English Wiktionary.☆8Updated last year
- Jurisdiction ID and abbreviation data files for using with Jurism and other projects.☆37Updated last year
- Inspect a URL and estimate if it contains a news story☆39Updated 6 months ago
- The Wikinflection Corpus, from the paper "Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus" (Metheni…☆12Updated last year