jonsafari/clustercat

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jonsafari/clustercat)

jonsafari / clustercat

Fast Word Clustering Software

☆79

Alternatives and similar repositories for clustercat

Users that are interested in clustercat are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jonsafari / habeas-corpus
View on GitHub
Command-line corpus tools
☆12May 15, 2017Updated 9 years ago
jonsafari / tok-tok
View on GitHub
A fast, simple, multilingual tokenizer
☆29May 24, 2017Updated 9 years ago
coastalcph / rungsted
View on GitHub
Fast structured perceptron sequential labeler
☆15Dec 8, 2015Updated 10 years ago
amittai / cynical
View on GitHub
Cynical data selection
☆20Jan 16, 2021Updated 5 years ago
karlstratos / minitagger
View on GitHub
☆21Apr 4, 2015Updated 11 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
percyliang / brown-cluster
View on GitHub
C++ implementation of the Brown word clustering algorithm.
☆429Sep 10, 2023Updated 2 years ago
tastyminerals / ccrawl
View on GitHub
Simple CORPORA list crawler
☆11Dec 2, 2016Updated 9 years ago
techiaith / docker-moses-smt
View on GitHub
Hwyluso cyfieithu peirianyddol MosesSMT i'r Gymraeg // Making MosesSMT machine translation easier for Welsh (and other languages)
☆16Aug 25, 2021Updated 4 years ago
leondz / entity_recognition
View on GitHub
framework for doing NER and other types of entity recognition, in Python
☆68Jun 21, 2022Updated 4 years ago
sean-chester / generalised-brown
View on GitHub
C++ implementation of Generalised Brown clustering and python scripts for feature generation
☆41Apr 8, 2016Updated 10 years ago
ivan-zapreev / Distributed-Translation-Infrastructure
View on GitHub
The distributed statistical machine translation infrastructure consisting of load balancing, text pre/post-processing and translation ser…
☆12Nov 29, 2018Updated 7 years ago
elexis-eu / MWSA
View on GitHub
Datasets for the Monolingual Word Sense Alignment (MWSA) task
☆12Nov 10, 2020Updated 5 years ago
OpenNMT / Recipes
View on GitHub
Recipes for training OpenNMT systems
☆14Jul 26, 2017Updated 8 years ago
karlmoritz / bicvm
View on GitHub
BiCVM Code
☆45May 14, 2018Updated 8 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
burningion / crappy-product-generator
View on GitHub
Generate crappy products and reviews using Amazon's dataset
☆17Jan 11, 2016Updated 10 years ago
duyvuleo / Transformer-DyNet
View on GitHub
An Implementation of Transformer (Attention Is All You Need) in DyNet
☆64Nov 30, 2023Updated 2 years ago
senarvi / theanolm
View on GitHub
TheanoLM is a recurrent neural network language modeling tool implemented using Theano
☆81Jun 20, 2024Updated 2 years ago
ufal / parsito
View on GitHub
Parsito: Fast non-projective transition-based dependency parser
☆14Nov 24, 2025Updated 7 months ago
robertostling / efmaral
View on GitHub
Efficient Markov Chain word alignment
☆53Aug 1, 2021Updated 4 years ago
jwieting / paragram-word
View on GitHub
Python code for training Paragram word embeddings. These achieve human-level performance on some word similiarty tasks including SimLex-9…
☆30Feb 4, 2016Updated 10 years ago
amake / moses-smt
View on GitHub
Dock You a Moses: Moses Statistical MT in a container
☆14Feb 18, 2020Updated 6 years ago
Roxot / AEVNMT
View on GitHub
Auto-Encoding Variational Neural Machine Translation
☆16Jan 22, 2020Updated 6 years ago
clab / wikipedia-parallel-titles
View on GitHub
Tools for extracting parallel corpora from article titles across languages in Wikipedia
☆74Feb 25, 2015Updated 11 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
alvations / lazyme
View on GitHub
Lazy python recipes.
☆10Apr 17, 2026Updated 3 months ago
lateral / hyperplane-hasher
View on GitHub
Implementation of an algorithm computing the nearest "N" neighbours to a vector, using a collection of hyperplane hashers.
☆30Jul 17, 2015Updated 11 years ago
hal3 / vwnlp
View on GitHub
Solving NLP problems with Vowpal Wabbit: Tutorial and more
☆183Mar 8, 2016Updated 10 years ago
semanticize / st
View on GitHub
Semanticizest: dump parser and client
☆20May 11, 2016Updated 10 years ago
gouwsmeister / bilbowa
View on GitHub
Open-source implementation of the BilBOWA (Bilingual Bag-of-Words without Alignments) word embedding model.
☆69Jul 28, 2021Updated 4 years ago
bob-carpenter / anno
View on GitHub
Models, scripts, and data sets for data annotation (aka coding, aka rating)
☆12Mar 9, 2015Updated 11 years ago
fabianp / pysofia
View on GitHub
old repository, maintained version is at https://github.com/rth/pysofia
☆27May 20, 2016Updated 10 years ago
lupanh / Vietnamese-Person-Questions-Dataset
View on GitHub
Tập dữ liệu câu hỏi về người trong tiếng Việt đã được gán nhãn
☆16Jul 30, 2015Updated 10 years ago
joosthub / pytorch-nlp-tutorial-sf2017
View on GitHub
Materials for O'Reilly DL 4 NLP tutorial (SF 2017)
☆25Sep 18, 2017Updated 8 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
jacobBaumbach / MCWMD
View on GitHub
Implementation of Monte Carlo Word Movers Distance in Python with TensorFlow
☆12Sep 12, 2016Updated 9 years ago
ual / rental-listings
View on GitHub
Analyzing and visualizing rental listings data
☆12Feb 28, 2019Updated 7 years ago
allenai / brat
View on GitHub
brat rapid annotation tool (brat) - for all your textual annotation needs
☆10Feb 3, 2018Updated 8 years ago
tetsuok / arowpp
View on GitHub
AROW++ An implementation of the efficient confidence-weighted classifier
☆11Jan 9, 2021Updated 5 years ago
seomoz / qdr
View on GitHub
Query-Document Relevance
☆42Feb 6, 2015Updated 11 years ago
coastalcph / supersense-data-twitter
View on GitHub
Tweets annotated with coarse-grained sense labels (supersenses)
☆13Jun 13, 2014Updated 12 years ago
moses-smt / mgiza
View on GitHub
A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.
☆167May 12, 2021Updated 5 years ago