LanguageMachines / uctoLinks
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules…
☆68Updated this week
Alternatives and similar repositories for ucto
Users that are interested in ucto are comparing it to the libraries listed below
Sorting:
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆112Updated 5 months ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- Various utilities for processing the data.☆209Updated this week
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆127Updated 6 months ago
- Multi Tier Annotation Search☆26Updated 4 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆64Updated last year
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl,…☆77Updated 2 weeks ago
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- TiMBL implements several memory-based learning algorithms.☆52Updated 2 weeks ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆17Updated last week
- texrex web page cleaning & ClaraX random walk crawler☆11Updated 3 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year
- German Morphological Analyzer☆47Updated 3 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆40Updated 5 years ago
- Text-Induced Corpus Clean-up☆20Updated 2 years ago
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆23Updated last year
- Thot toolkit for statistical machine translation☆53Updated 2 years ago
- Bilingual sentence aligner (Gale & Church, 1993)☆14Updated 6 years ago
- A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used st…☆24Updated 5 months ago
- This repository contains the Framester resource, the main outcome of the framester project.☆33Updated 5 years ago
- A tool for text normalisation via character-level machine translation☆13Updated 5 years ago
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆75Updated 2 weeks ago
- Ukb: graph-based WSD and similarity☆106Updated last year
- A simple configurable tool for manipulating dependency trees.☆13Updated 6 months ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated 2 years ago
- General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
- eXternally configurable REference and Non Named Entity Recognizer☆17Updated last year
- Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist☆36Updated 6 years ago
- Learning by Reading pipeline of NLP and Entity Linking tools☆85Updated 2 years ago