davidmogar/cucco

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/davidmogar/cucco)

davidmogar / cucco

Text normalization library for Python

☆201

Alternatives and similar repositories for cucco

Users that are interested in cucco are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pudo / normality
View on GitHub
A tiny library for Python text normalisation. Useful for ad-hoc text processing.
☆157Mar 8, 2026Updated 3 months ago
blacklight / Takk
View on GitHub
Speech recognition in Python made easy and flexible
☆11Sep 12, 2015Updated 10 years ago
idiap / asrt
View on GitHub
Various scripts that facilitate the preparation of Automatic Speech Recognition related resources
☆17Apr 16, 2020Updated 6 years ago
soshial / text-normalization
View on GitHub
Python tool for normilizing text and text canonicalization (DISCONTINUED)
☆41Sep 3, 2013Updated 12 years ago
piskvorky / sparsesvd
View on GitHub
Python wrapper around SVDLIBC, a fast library for sparse Singular Value Decomposition
☆55Aug 16, 2013Updated 12 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
gouwsmeister / TextCleanser
View on GitHub
Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".
☆62Oct 1, 2015Updated 10 years ago
dufferzafar / what-to-watch
View on GitHub
Python script to help you decide what movie to watch.
☆34Aug 25, 2015Updated 10 years ago
shamidreza / unitselection
View on GitHub
A python implementation of a simple Unit Selection Text-to-Speech (TTS) synthesis system. It works with CMU-Arctic data by default
☆11Mar 14, 2015Updated 11 years ago
MiniXC / opensubtitles-dataloader
View on GitHub
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
☆13Aug 26, 2020Updated 5 years ago
jgeskens / django-tinyschedule
View on GitHub
A small Django app for managing schedules
☆13Dec 26, 2022Updated 3 years ago
noisy-text / noisy-text.github.io
View on GitHub
Workshop on Noisy User-generated Text (W-NUT)
☆31Jun 21, 2026Updated 2 weeks ago
willf / segment
View on GitHub
A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']
☆79Apr 23, 2016Updated 10 years ago
CogComp / cogcomp-nlpy
View on GitHub
CogComp's light-weight Python NLP annotators
☆115Feb 18, 2019Updated 7 years ago
tsproisl / SoMaJo
View on GitHub
A tokenizer and sentence splitter for German and English web and social media texts.
☆152Dec 9, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
davidsbatista / information-extraction-PT
View on GitHub
An example of triples extraction with PoS-tags using ReVerb
☆17May 23, 2017Updated 9 years ago
RaRe-Technologies / topic_eval
View on GitHub
Tools and services for evaluating topic models
☆15Apr 12, 2016Updated 10 years ago
nlx-group / LX-DSemVectors
View on GitHub
Distributional Semantics Models for Portuguese
☆26Jul 4, 2020Updated 6 years ago
revdotcom / words2num
View on GitHub
Convert words to numbers
☆21Apr 13, 2022Updated 4 years ago
jwass / geog
View on GitHub
Quick and easy geographical functions in Python
☆41Apr 1, 2022Updated 4 years ago
wenet-e2e / WeTextProcessing.deprecated
View on GitHub
☆61Jan 31, 2023Updated 3 years ago
catherinedevlin / python_learners_glossary
View on GitHub
Definitions of Pardon jargon to help Python beginners understand Pythonista gobbletigook
☆55Mar 5, 2020Updated 6 years ago
MiniXC / LightningFastSpeech2
View on GitHub
☆55Jan 13, 2023Updated 3 years ago
scavallari / Topic2Vec
View on GitHub
☆21May 24, 2016Updated 10 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
word-fish / wordfish-python
View on GitHub
extract relationships from standardized terms from corpus of interest with deep learning
☆19Dec 31, 2019Updated 6 years ago
ksingla025 / Speaker_Dia_RedHen
View on GitHub
This is the home directory to speaker diarization module being developed for Hetergeneous News data in RedHen Labs as a GSOC Project
☆10Sep 11, 2015Updated 10 years ago
davecarpie / scli
View on GitHub
A selectable, scrollable list interface for terminal applications built using curses
☆10Jun 30, 2015Updated 11 years ago
ispmarin / maps
View on GitHub
Test several Python map frameworks
☆11Feb 16, 2016Updated 10 years ago
jtkim-kaist / end-point-detection
View on GitHub
☆10Sep 19, 2018Updated 7 years ago
lumenrobot / relex-id
View on GitHub
Semantic dependency relationship extractor untuk bahasa Indonesia... termasuk bahasa gaul dan alay ;) (terinspirasi oleh OpenCog RelEx)
☆10Oct 2, 2015Updated 10 years ago
tweekmonster / moult
View on GitHub
A utility for finding Python packages that may not be in use.
☆50Feb 11, 2017Updated 9 years ago
fnl / segtok
View on GitHub
Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…
☆170Dec 15, 2021Updated 4 years ago
yhat / yhat-examples
View on GitHub
Some examples of Yhat
☆23Jun 11, 2014Updated 12 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
apraditya / indonesian_stemmer
View on GitHub
Porter Stemmer for Bahasa Indonesia
☆25Aug 10, 2015Updated 10 years ago
impresso / named-entity-tutorial-dh2019
View on GitHub
Tutorial on NE processing for Digital Humanities - DH Utrech 2019
☆24Jul 18, 2019Updated 6 years ago
BramVanroy / spacy-extreme
View on GitHub
An example of how to use spaCy for extremely large files without running into memory issues
☆36Sep 17, 2022Updated 3 years ago
mrocklin / dasklearn
View on GitHub
Dask powered gridsearch and pipeline a la scikit-learn
☆42Nov 2, 2015Updated 10 years ago
TanUkkii007 / deepvoice3-tensorflow
View on GitHub
A tensorflow based implementation of DeepVoice3 https://arxiv.org/abs/1710.07654
☆13Jun 5, 2018Updated 8 years ago
EFord36 / normalise
View on GitHub
A module for normalising text.
☆172Oct 27, 2021Updated 4 years ago
mrocklin / dask-spark
View on GitHub
Dask and Spark interactions
☆21Mar 13, 2017Updated 9 years ago