LanguageMachines / uctoLinks
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules…
☆70Updated last month
Alternatives and similar repositories for ucto
Users that are interested in ucto are comparing it to the libraries listed below
Sorting:
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆129Updated 11 months ago
- Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl,…☆79Updated 3 weeks ago
- Various utilities for processing the data.☆215Updated this week
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 10 months ago
- Ukb: graph-based WSD and similarity☆107Updated last year
- Learning by Reading pipeline of NLP and Entity Linking tools☆85Updated 3 years ago
- Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser…☆49Updated 8 months ago
- Machine translation for the real world☆23Updated 5 years ago
- General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
- GermaNER: Free Open German Named Entity Recognition Tool☆36Updated last year
- Federated Knowledge Extraction Framework☆193Updated 2 years ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- Python framework for processing Universal Dependencies data☆57Updated 3 weeks ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆197Updated 5 years ago
- Extension of the mate-tools NLP pipeline☆66Updated 9 years ago
- Multi Tier Annotation Search☆26Updated 4 years ago
- Parsito: Fast non-projective transition-based dependency parser☆14Updated 2 weeks ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆249Updated 2 years ago
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Updated 8 years ago
- FreeLing project source code☆261Updated 2 years ago
- Treex NLP framework☆32Updated 3 weeks ago
- Software and resources for natural language processing.☆132Updated 9 years ago
- Universal Dependencies online documentation☆288Updated last week
- German Morphological Analyzer☆51Updated 4 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆66Updated last year
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆69Updated 2 years ago
- Thot toolkit for statistical machine translation☆53Updated 3 years ago
- FoLiA library for C++☆17Updated this week
- A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, …☆34Updated 6 years ago
- TiMBL implements several memory-based learning algorithms.☆53Updated last week