jfilter / german-preprocessing
π©πͺ Preprocess German texts to do some serious natural-language processing.
β11Updated last year
Related projects β
Alternatives and complementary repositories for german-preprocessing
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on Germanβ451Updated 3 weeks ago
- A lemmatizer for German language textβ87Updated last year
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the datβ¦β148Updated 8 months ago
- High-performance text aligner for large collections of textsβ45Updated last month
- A simple text reuse detection CLI tool.β126Updated 5 months ago
- Poetic processing, for Python.β38Updated 6 months ago
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.β22Updated last year
- CLDF: Cross-Linguistic Data Formats - the specificationβ55Updated 7 months ago
- A French Lemmatizer in Python based on the LEFFFβ36Updated 4 years ago
- German language support for TextBlob.β104Updated 3 years ago
- Human Language Technology Notebooks for Lab sessions, Master Studentsβ14Updated last month
- Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.β277Updated 3 weeks ago
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpusβ17Updated 6 months ago
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.β124Updated 3 years ago
- Supervised Stylometryβ21Updated last week
- β13Updated last month
- The Hanover Tagger - A simple approach to lemmatization and POS-tagging of German morphology based on heuristics and hidden markov modelsβ¦β47Updated last year
- Open German WordNetβ88Updated 9 months ago
- GermaParl: Corpus of Plenary Protocols of the German Bundestag (TEI Format)β30Updated last year
- Information extraction from English and German texts based on predicate logicβ389Updated 2 years ago
- A Python module to manipulate data on a Wikibase instance (like Wikidata) through the MediaWiki Wikibase API and the Wikibase SPARQL endpβ¦β67Updated this week
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techβ¦β70Updated this week
- Diachronic Spanish Sonnet Corpus. Canonical and minor authors in Spanish (Europe, America and Asia): 15th to 20th centuryβ15Updated last year
- This repository contains all the materials for my "Python Programming for Linguists" workshop. This is a Python workshop for beginners wiβ¦β27Updated last year
- β79Updated last week
- π©βπ¬ A web-based, open-access platform for linguistic research on old indic textsβ17Updated last month
- Ten Thousand German News Articles Dataset for Topic Classificationβ84Updated 2 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.β135Updated 3 months ago
- Helsinki Finite-State Technology (library and application suite)β124Updated this week
- Description of the Project. If you have any suggestions for the entire project, please, add it as an issue to this repository!β21Updated 2 months ago