LanguageMachines / uctoLinks
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules…
☆69Updated 2 months ago
Alternatives and similar repositories for ucto
Users that are interested in ucto are comparing it to the libraries listed below
Sorting:
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆129Updated 9 months ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 7 months ago
- Various utilities for processing the data.☆211Updated this week
- Ukb: graph-based WSD and similarity☆106Updated last year
- Thot toolkit for statistical machine translation☆53Updated 2 years ago
- General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
- Excitement Open Platform for Recognizing Textual Entailments☆88Updated 7 years ago
- Named Entity Recognition data for Europeana Newspapers☆173Updated 2 years ago
- This repository contains the Framester resource, the main outcome of the framester project.☆33Updated 5 years ago
- Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl,…☆78Updated last month
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 7 years ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆197Updated 4 years ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆249Updated 2 years ago
- Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".☆63Updated 9 years ago
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Updated 8 years ago
- German Morphological Analyzer☆47Updated 3 years ago
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser…☆49Updated 5 months ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- Learning by Reading pipeline of NLP and Entity Linking tools☆85Updated 2 years ago
- Universal Dependencies online documentation☆289Updated this week
- Machine translation for the real world☆23Updated 5 years ago
- FreeLing project source code☆259Updated 2 years ago
- Automatically exported from code.google.com/p/universal-pos-tags☆130Updated 3 years ago
- A tool for text normalisation via character-level machine translation☆13Updated 5 years ago
- Multi Tier Annotation Search☆26Updated 4 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆65Updated last year
- Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/☆17Updated 7 years ago
- Federated Knowledge Extraction Framework☆193Updated last year
- ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.☆57Updated last month