LanguageMachines / ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules…
☆65Updated last week
Related projects ⓘ
Alternatives and complementary repositories for ucto
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆124Updated this week
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆110Updated 4 months ago
- TiMBL implements several memory-based learning algorithms.☆46Updated last week
- A tool for automatic spelling normalization☆20Updated 3 years ago
- Ukb: graph-based WSD and similarity☆106Updated 6 months ago
- Various utilities for processing the data.☆207Updated this week
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- Python framework for processing Universal Dependencies data☆57Updated this week
- Thot toolkit for statistical machine translation☆50Updated 2 years ago
- Machine translation for the real world☆23Updated 4 years ago
- ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.☆54Updated 2 weeks ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- German Morphological Analyzer☆47Updated 3 years ago
- Named Entity Recognition data for Europeana Newspapers☆173Updated last year
- Hierarchical phrase-based machine translation system☆32Updated 9 years ago
- Learning by Reading pipeline of NLP and Entity Linking tools☆82Updated last year
- Text-Induced Corpus Clean-up☆20Updated last year
- Parsito: Fast non-projective transition-based dependency parser☆14Updated last year
- The Global WordNet Association Collaborative Inter-Lingual Index☆40Updated 2 weeks ago
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- A Named-Entity Recogniser based on Grobid.☆49Updated 2 months ago
- Multi Tier Annotation Search☆26Updated 3 years ago
- FoLiA library for C++☆16Updated this week
- A tool for text normalisation via character-level machine translation☆13Updated 4 years ago
- Extension of the mate-tools NLP pipeline☆67Updated 8 years ago
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆70Updated this week
- A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used st…☆23Updated last year
- A simple configurable tool for manipulating dependency trees.☆13Updated 6 months ago
- CRF-based Morphological Tagging and Lemmatization☆35Updated 5 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆41Updated 5 years ago