EmilStenstrom / conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
☆312Updated last month
Alternatives and similar repositories for conllu:
Users that are interested in conllu are comparing it to the libraries listed below
- Various utilities for processing the data.☆207Updated this week
- A minimal, pure Python library to interface with CoNLL-U format files.☆148Updated last year
- A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…☆221Updated 2 years ago
- spaCy + UDPipe☆160Updated 2 years ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆375Updated 2 months ago
- Universal Dependencies online documentation☆281Updated this week
- Text tokenization and sentence segmentation (segtok v2)☆201Updated 2 years ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆189Updated 4 years ago
- Unsupervised Statistical Machine Translation☆229Updated 4 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆357Updated last year
- Open-Source Machine Translation Quality Estimation in PyTorch☆230Updated 2 years ago
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆253Updated 5 months ago
- Language independent truecaser in Python.☆160Updated 3 years ago
- A frame-semantic parsing system based on a softmax-margin SegRNN.☆229Updated 2 years ago
- Python framework for processing Universal Dependencies data☆55Updated last week
- English data☆205Updated this week
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆730Updated 6 months ago
- Disambiguate is a tool for training and using state of the art neural WSD models☆59Updated 2 years ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆244Updated last year
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- Named Entity Recognition based on dictionaries☆242Updated 5 years ago
- This is a CoNLL formatted version of the OntoNotes 5.0 release.☆190Updated 10 years ago
- Efficient Low-Memory Aligner☆141Updated last month
- Automatically exported from code.google.com/p/universal-pos-tags☆129Updated 2 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- CONLL-U to Pandas DataFrame☆31Updated 7 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆154Updated 8 months ago
- Python library for Natural Language Preprocessing (NLPre)☆190Updated last year
- Python port of Moses tokenizer, truecaser and normalizer☆489Updated 8 months ago
- Ten Thousand German News Articles Dataset for Topic Classification☆84Updated 2 years ago