EmilStenstrom / conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
☆312Updated last month
Related projects ⓘ
Alternatives and complementary repositories for conllu
- Various utilities for processing the data.☆207Updated this week
- spaCy + UDPipe☆161Updated 2 years ago
- Text tokenization and sentence segmentation (segtok v2)☆203Updated 2 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆149Updated last year
- A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…☆220Updated last year
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆249Updated 2 months ago
- Language independent truecaser in Python.☆161Updated 3 years ago
- Universal Dependencies online documentation☆273Updated this week
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆725Updated 3 months ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆364Updated last week
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆351Updated last year
- Python port of Moses tokenizer, truecaser and normalizer☆488Updated 5 months ago
- Automatic extraction of edited sentences from text edition histories.☆81Updated 2 years ago
- Unsupervised Statistical Machine Translation☆228Updated 4 years ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆185Updated 4 years ago
- Implementation of the ClausIE information extraction system for python+spacy☆220Updated 2 years ago
- Google USE (Universal Sentence Encoder) for spaCy☆177Updated last year
- LASER multilingual sentence embeddings as a pip package☆225Updated last year
- English data☆201Updated this week
- German Morphological Analyzer☆47Updated 3 years ago
- A python module for English lemmatization and inflection.☆261Updated last year
- A neural word aligner based on multilingual BERT☆328Updated 2 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆135Updated 3 months ago
- Efficient Low-Memory Aligner☆139Updated 2 months ago
- Efficient and clean PyTorch reimplementation of "End-to-end Neural Coreference Resolution" (Lee et al., EMNLP 2017).☆185Updated 3 years ago
- Open-Source Machine Translation Quality Estimation in PyTorch☆228Updated 2 years ago
- This is a CoNLL formatted version of the OntoNotes 5.0 release.☆190Updated 9 years ago
- CONLL-U to Pandas DataFrame☆31Updated 7 years ago
- Enhanced Subject Word Object Extraction☆148Updated 3 years ago
- Easier Automatic Sentence Simplification Evaluation☆159Updated last year