roedoejet / convertextract
Extract and find/replace text based on arbitrary correspondences while preserving original file formatting. This library is a fork from the Textract library by Dean Malmgren.
☆11Updated last year
Alternatives and similar repositories for convertextract:
Users that are interested in convertextract are comparing it to the libraries listed below
- Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language☆16Updated this week
- Domain-specific programming language for linguistic grammars and transducers — Langage dédié pour les grammaires linguistiques et les tra…☆13Updated this week
- universal syllabification algorithms☆44Updated 2 years ago
- Audiobook alignment for Indigenous languages☆39Updated last month
- English to IPA with syllable correspondence☆11Updated 2 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆35Updated last month
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆47Updated last year
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- The Data Format for Digital Linguistics (DaFoDiL)☆22Updated 2 years ago
- A tool for automatic phoneme transcription☆157Updated last year
- Transform TMX to text☆28Updated 2 years ago
- Python Finite-State Toolkit☆54Updated last month
- A simple configurable tool for manipulating dependency trees.☆13Updated 3 months ago
- The Metadata Editor for Transparent Archiving of language document materials☆20Updated 2 months ago
- Script for workflow to add morphological analysis into ELAN files☆13Updated 4 years ago
- Program used to split text into segments☆25Updated 5 months ago
- Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.☆23Updated last year
- Tools and scripts for working with ELAN☆10Updated 2 years ago
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- Python module for syllabifying English ARPABET transcriptions☆66Updated 6 years ago
- 🙊 software for creating speech recognition models.☆159Updated 10 months ago
- PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files.☆17Updated 12 years ago
- Bilingual sentence aligner (Gale & Church, 1993)☆14Updated 6 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- MIT Language Modeling Toolkit☆116Updated 5 years ago
- eXternally configurable REference and Non Named Entity Recognizer☆17Updated 9 months ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Updated 8 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆28Updated 3 years ago