LanguageMachines / uctoLinks
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules…
☆69Updated 3 weeks ago
Alternatives and similar repositories for ucto
Users that are interested in ucto are comparing it to the libraries listed below
Sorting:
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆127Updated 6 months ago
- Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl,…☆77Updated this week
- A tool for automatic spelling normalization☆20Updated 4 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 5 months ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆65Updated last year
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 7 years ago
- Various utilities for processing the data.☆210Updated last week
- Thot toolkit for statistical machine translation☆53Updated 2 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- TiMBL implements several memory-based learning algorithms.☆52Updated 3 weeks ago
- Treex NLP framework☆32Updated 2 weeks ago
- Ukb: graph-based WSD and similarity☆106Updated last year
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆197Updated 4 years ago
- This repository contains the Framester resource, the main outcome of the framester project.☆33Updated 5 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆75Updated last month
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser…☆49Updated 3 months ago
- Multi Tier Annotation Search☆26Updated 4 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 2 years ago
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- Universal Dependencies online documentation☆287Updated this week
- Learning by Reading pipeline of NLP and Entity Linking tools☆85Updated 2 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆40Updated 6 years ago
- FoLiA library for C++☆16Updated 3 weeks ago
- Socially-Equitable Language Identification☆78Updated 2 years ago
- Extension of the mate-tools NLP pipeline☆67Updated 9 years ago
- General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
- eXternally configurable REference and Non Named Entity Recognizer☆17Updated last year