LanguageMachines / uctoLinks
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules…
☆70Updated 2 weeks ago
Alternatives and similar repositories for ucto
Users that are interested in ucto are comparing it to the libraries listed below
Sorting:
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆130Updated last year
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 11 months ago
- TiMBL implements several memory-based learning algorithms.☆53Updated 2 weeks ago
- Various utilities for processing the data.☆216Updated this week
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 3 years ago
- Ukb: graph-based WSD and similarity☆107Updated last year
- Open-source tools for morphological tagging, segmentation and stemming.☆40Updated 6 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Updated 2 years ago
- Universal Dependencies online documentation☆287Updated this week
- Thot toolkit for statistical machine translation☆53Updated 3 years ago
- A simple configurable tool for manipulating dependency trees.☆14Updated last year
- Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".☆63Updated 10 years ago
- Treex NLP framework☆32Updated last month
- Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser…☆49Updated 9 months ago
- Learning by Reading pipeline of NLP and Entity Linking tools☆85Updated 3 years ago
- Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl,…☆79Updated 2 weeks ago
- Multi Tier Annotation Search☆26Updated 4 years ago
- A multilingual dependency parser based on linear programming relaxations.☆115Updated 6 years ago
- FoLiA library for C++☆17Updated 2 weeks ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆198Updated 5 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 10 years ago
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆75Updated 2 months ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆66Updated 3 weeks ago
- Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/☆17Updated 7 years ago
- Extension of the mate-tools NLP pipeline☆67Updated 9 years ago
- FreeLing project source code☆261Updated 2 years ago
- BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/b…☆228Updated 4 years ago
- GermaNER: Free Open German Named Entity Recognition Tool☆36Updated 2 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 7 years ago
- Named Entity Recognition data for Europeana Newspapers☆173Updated 2 years ago