proycon/colibri-core

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/proycon/colibri-core)

proycon / colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipul…

☆131

Alternatives and similar repositories for colibri-core

Users that are interested in colibri-core are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LanguageMachines / libfolia
View on GitHub
FoLiA library for C++
☆18Mar 25, 2026Updated 4 months ago
LanguageMachines / timbl
View on GitHub
TiMBL implements several memory-based learning algorithms.
☆55Jul 6, 2026Updated 3 weeks ago
LanguageMachines / ucto
View on GitHub
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…
☆72Updated this week
coastalcph / rungsted
View on GitHub
Fast structured perceptron sequential labeler
☆15Dec 8, 2015Updated 10 years ago
martinreynaert / TICCL
View on GitHub
Text-Induced Corpus Clean-up
☆20Jun 20, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LanguageMachines / ticcltools
View on GitHub
Tools for TICCL
☆14Dec 12, 2025Updated 7 months ago
jonsafari / tok-tok
View on GitHub
A fast, simple, multilingual tokenizer
☆29May 24, 2017Updated 9 years ago
ayoshiaki / tops
View on GitHub
☆37Jun 10, 2024Updated 2 years ago
LanguageMachines / frog
View on GitHub
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl,…
☆82Jun 19, 2026Updated last month
proycon / pynlpl
View on GitHub
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…
☆476Sep 14, 2023Updated 2 years ago
LanguageMachines / PICCL
View on GitHub
A set of workflows for corpus building through OCR, post-correction and normalisation
☆50Sep 7, 2022Updated 3 years ago
proycon / python-ucto
View on GitHub
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet…
☆32Feb 2, 2026Updated 5 months ago
nickvosk / acl2015-dataset-learning-to-explain-entity-relationships
View on GitHub
Dataset for the ACL 2015 paper : Learning to Explain Entity Relationships in Knowledge Graphs
☆11Oct 22, 2015Updated 10 years ago
meta-toolkit / meta
View on GitHub
A Modern C++ Data Sciences Toolkit
☆714Apr 17, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
riyazbhat / Unsupervised-Second-Order-HMM
View on GitHub
Second Order Implementation of Hidden Markov Model for Tagging.
☆15Mar 17, 2022Updated 4 years ago
proycon / clam
View on GitHub
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…
☆135Updated this week
StevenReitsma / sonnet
View on GitHub
Winning data science solution for Energy Hack NL 2018. Sonnet: forecasting station load caused by solar panels.
☆11May 28, 2018Updated 8 years ago
proycon / flat
View on GitHub
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…
☆113Jan 24, 2025Updated last year
proycon / python-timbl
View on GitHub
python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…
☆18May 2, 2025Updated last year
cental / PatternSim
View on GitHub
A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.
☆27Feb 13, 2016Updated 10 years ago
marian-nmt / sotastream
View on GitHub
A library for data streaming and augmentation
☆22May 5, 2025Updated last year
PomanoB / lsse
View on GitHub
Serelex - lexico-semantic search engine
☆19Mar 19, 2017Updated 9 years ago
markhatton / google-ngrams
View on GitHub
Shell scripts to assist downloading & processing the Google n-grams corpora
☆13Apr 26, 2017Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
P4ndaFR / homesuite-ansible
View on GitHub
An ansible playbook to deploy a ready-to-use nextcloud w/ collabora based on https://brendan.abolivier.bzh/your-own-google-drive-docs/ fr…
☆12Sep 20, 2019Updated 6 years ago
semanticize / semanticizer
View on GitHub
Entity Linking for the masses
☆57Nov 10, 2015Updated 10 years ago
pauldb89 / OxLM
View on GitHub
OxLM: Oxford Neural Language Modelling Toolkit
☆39Nov 6, 2015Updated 10 years ago
clab / knowledge
View on GitHub
☆10Oct 6, 2015Updated 10 years ago
fnan / FeatureBudgetedRandomForest
View on GitHub
code for paper "Feature-Budgeted Random Forest" ICML 2015
☆11May 10, 2017Updated 9 years ago
jkkummerfeld / neural-tagger-tutorial
View on GitHub
Exploring implementing a simple tagger using neural network frameworks
☆20Oct 24, 2022Updated 3 years ago
tastyminerals / ccrawl
View on GitHub
Simple CORPORA list crawler
☆11Dec 2, 2016Updated 9 years ago
pprett / nut
View on GitHub
Natural language Understanding Toolkit
☆119May 7, 2014Updated 12 years ago
yasmina85 / OffTopic-Detection
View on GitHub
This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.
☆17Aug 20, 2015Updated 10 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
KIZI / LinkedHypernymsDataset
View on GitHub
☆14Aug 24, 2021Updated 4 years ago
proycon / gecco
View on GitHub
Generic Environment for Context-Aware Correction of Orthography
☆24Sep 7, 2022Updated 3 years ago
andreasvc / disco-dop
View on GitHub
Discontinuous Data-Oriented Parsing
☆47Jan 5, 2024Updated 2 years ago
leondz / entity_recognition
View on GitHub
framework for doing NER and other types of entity recognition, in Python
☆68Jun 21, 2022Updated 4 years ago
alvations / nltk_cli
View on GitHub
☆20Apr 26, 2017Updated 9 years ago
stickeritis / sticker2
View on GitHub
Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot
☆13Dec 18, 2020Updated 5 years ago
commonsense / conceptdb
View on GitHub
A platform for storing large semantic networks on MongoDB
☆22Jun 20, 2011Updated 15 years ago