davidmogar / cuccoLinks
Text normalization library for Python
β204Updated 7 years ago
Alternatives and similar repositories for cucco
Users that are interested in cucco are comparing it to the libraries listed below
Sorting:
- π« Scripts, tools and resources for developing spaCyβ126Updated 6 years ago
- Goal: make Pattern compatible with Python 3.β59Updated 5 years ago
- NLTK Contribβ166Updated last year
- Python wrapper for aspell (C extension and python version)β82Updated last year
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']β81Updated 9 years ago
- Language Legoβ141Updated 5 years ago
- Python wrapper for Stanford CoreNLP toolsβ58Updated 9 years ago
- Snowball stemming library collection for Pythonβ121Updated 6 years ago
- Python tool for normilizing text and text canonicalization (DISCONTINUED)β41Updated 11 years ago
- PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, anβ¦β477Updated last year
- Python utilities for detecting textual reuseβ21Updated 9 years ago
- Tokenize English sentences using neural networks.β63Updated 7 years ago
- A Python implementation of the Metaphone and Double Metaphone algorithmsβ81Updated last year
- A toolkit for corpus linguisticsβ204Updated 6 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic feβ¦β170Updated 3 years ago
- Fast Word Clustering Softwareβ78Updated 4 months ago
- This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikβ¦β259Updated 8 years ago
- Python bindings to the Compact Language Detectorβ33Updated 5 years ago
- TheanoLM is a recurrent neural network language modeling tool implemented using Theanoβ81Updated last year
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.β154Updated 7 months ago
- Python Set subclass that supports searching by ngram similarityβ119Updated 3 years ago
- A series of scripts to download and parse the OpenSubtitles corpus.β86Updated 9 years ago
- A Multilingual and Multilevel Representation Learning Toolkit for NLPβ117Updated 7 years ago
- Lightning Fast Language Prediction πβ167Updated 6 years ago
- Python bindings for libwapitiβ67Updated 5 years ago
- Language detection extension for spaCy 2.0+β113Updated 6 years ago
- Collects all tweets from the sample Public stream using Twitter's streaming API, and saves them to a file for later use as a corpus.β45Updated 4 years ago
- Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".β63Updated 9 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)β67Updated 8 years ago
- Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important.β55Updated 10 years ago