proycon/python-ucto

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/proycon/python-ucto)

proycon / python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser writt…

☆32

Alternatives and similar repositories for python-ucto

Users that are interested in python-ucto are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

marthyns / mlmapp
View on GitHub
A multi-level marketing web application -- matrix type
☆16Aug 20, 2015Updated 10 years ago
proycon / python-timbl
View on GitHub
python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…
☆18May 2, 2025Updated last year
emanjavacas / cosycat
View on GitHub
Collaborative Synchronized Corpus Annotation Tool
☆10Dec 31, 2018Updated 7 years ago
bhaskar-mitra / Demos
View on GitHub
A bag of miscellaneous demos!
☆13Feb 5, 2017Updated 9 years ago
longwind48 / convo-miner
View on GitHub
Mine conversations from novels in Project Gutenberg, to generate data for data-driven dialogue systems.
☆15May 7, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
proycon / python-frog
View on GitHub
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser…
☆50Feb 2, 2026Updated 5 months ago
concord / concord-py
View on GitHub
python client library
☆10Feb 15, 2017Updated 9 years ago
Exmaralda-Org / exmaralda
View on GitHub
☆33Updated this week
alecthomas / SublimeFoldPythonDocstrings
View on GitHub
Automatically folds Python docstrings longer than 1 line.
☆16Dec 2, 2023Updated 2 years ago
avjves / textreuse-blast
View on GitHub
A software to detect text reuse with BLAST.
☆13Oct 8, 2019Updated 6 years ago
swarajban / multithreadedWordCounting
View on GitHub
word count of large file using prefix tree and parallel python processes
☆18Sep 25, 2013Updated 12 years ago
ajaech / calm
View on GitHub
Context Aware Language Models
☆28Jul 3, 2018Updated 8 years ago
platinprotocol / solidityGEO
View on GitHub
Solidity Library for GIS Objects
☆13Nov 6, 2018Updated 7 years ago
kentdlee / MLComp
View on GitHub
A Compiler and Type Inference System for a subset of Standard ML called Small.
☆14May 19, 2017Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / adversarialnlp
View on GitHub
A generic library for crafting adversarial NLP examples - WIP
☆42Oct 26, 2018Updated 7 years ago
emanjavacas / pie
View on GitHub
A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.
☆25Oct 27, 2023Updated 2 years ago
jtcho / FairMachineLearning
View on GitHub
Implementation of provably Rawlsian fair ML algorithms for contextual bandits.
☆14May 10, 2017Updated 9 years ago
neunhoef / ArangoDBStarter
View on GitHub
A tool to start ArangoDB clusters and single servers conveniently.
☆13Feb 10, 2017Updated 9 years ago
LanguageMachines / libfolia
View on GitHub
FoLiA library for C++
☆18Mar 25, 2026Updated 3 months ago
dongguosheng / deepwalk
View on GitHub
weighted deepwalk implementation in c++
☆18Feb 8, 2017Updated 9 years ago
machinalis / yalign
View on GitHub
A sentence aligner for comparable corpora
☆131May 19, 2016Updated 10 years ago
riedlma / sequence_tagging
View on GitHub
Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German
☆26May 10, 2021Updated 5 years ago
gchrupala / first-steps-ml
View on GitHub
First steps in Machine Learning
☆12Mar 18, 2015Updated 11 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
proycon / colibri-core
View on GitHub
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…
☆131Feb 5, 2026Updated 5 months ago
EducationalTestingService / python-zpar
View on GitHub
A python wrapper around the ZPar parser for English.
☆50Apr 20, 2021Updated 5 years ago
tastyminerals / ccrawl
View on GitHub
Simple CORPORA list crawler
☆11Dec 2, 2016Updated 9 years ago
jtourille / yaset
View on GitHub
Yet Another SEquence Tagger
☆10Dec 8, 2022Updated 3 years ago
coastalcph / supersense-data-twitter
View on GitHub
Tweets annotated with coarse-grained sense labels (supersenses)
☆13Jun 13, 2014Updated 12 years ago
nicmer / analysis-zeitonline
View on GitHub
This repository contains machine-readable versions of the six German major parties' electoral programs for 2017 federal elections and cod…
☆16Aug 31, 2017Updated 8 years ago
minerva-ml / steppy-toolkit
View on GitHub
Curated set of transformers that make your work with steppy faster and more effective
☆23Nov 22, 2018Updated 7 years ago
coastalcph / rungsted
View on GitHub
Fast structured perceptron sequential labeler
☆15Dec 8, 2015Updated 10 years ago
thoppe / deep-phonics
View on GitHub
Deep learning spelling patterns with a recurrent neural network
☆11Jun 5, 2017Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
allenai / ml
View on GitHub
Re-usable low-level ML components
☆10Oct 31, 2018Updated 7 years ago
abunuwas / manning-twitch-build-deploy-api
View on GitHub
Repository for Manning Twitch session about building and deploying APIs with Python
☆12Jul 19, 2021Updated 5 years ago
TheClimateCorporation / S3DistVersions
View on GitHub
Distributed version restore tool for S3
☆12Jan 5, 2015Updated 11 years ago
Supervisor / meld3
View on GitHub
Unmaintained templating system used by old versions of Supervisor
☆21Nov 15, 2022Updated 3 years ago
dbeanm / UMLS-Neo4j
View on GitHub
Load all concepts and relationships from UMLS into a Neo4j database
☆13Jan 29, 2021Updated 5 years ago
Mrpatekful / cluster
View on GitHub
GPU accelerated K-Means and Mean Shift clustering in Tensorflow.
☆11Sep 24, 2018Updated 7 years ago
JHLiu7 / EarlyDRGPrediction
View on GitHub
☆12Apr 13, 2023Updated 3 years ago