explosion/curated-tokenizers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/explosion/curated-tokenizers)

explosion / curated-tokenizers

Lightweight piece tokenization library

☆12

Alternatives and similar repositories for curated-tokenizers

Users that are interested in curated-tokenizers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

explosion / os-signpost
View on GitHub
Wrapper for the macOS signpost API
☆18Apr 24, 2023Updated 3 years ago
oxidized-transformers / oxidized-transformers
View on GitHub
Modular Rust transformer/LLM library using Candle
☆39May 5, 2024Updated 2 years ago
AzureCosmosDB / AISamples
View on GitHub
Central hub for demos, code snippets, and other assets for Azure Cosmos DB for AI apps.
☆13Apr 9, 2025Updated last year
explosion / wikid
View on GitHub
Generate a SQLite database from Wikipedia & Wikidata dumps.
☆39Mar 27, 2024Updated 2 years ago
cwhy / rwkv-decon
View on GitHub
Trying to deconstruct RWKV in understandable terms
☆14May 6, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
sleepinyourhat / quora-duplicate-questions-util
View on GitHub
Converts Quora's new NLU dataset to SNLI txt/jsonl format, plus test/dev split, tokenization.
☆14Jan 27, 2017Updated 9 years ago
neulab / cmulab
View on GitHub
CMU Linguistic Annotation Backend
☆15Sep 22, 2025Updated 9 months ago
wjbmattingly / ww2-spacy
View on GitHub
☆17Jan 5, 2023Updated 3 years ago
ltgoslo / norec_fine
View on GitHub
Fine-grained sentiment annotations of NoReC
☆20Aug 1, 2022Updated 3 years ago
tokestermw / spacy_kenlm
View on GitHub
KenLM extension for spaCy 2.0.
☆16Dec 6, 2017Updated 8 years ago
saqimtiaz / tw5-feeds
View on GitHub
Experimental plugin to add support for RSS and JSON feeds to TiddlyWiki
☆10Jan 9, 2022Updated 4 years ago
sebpuetz / lumberjack
View on GitHub
Read and modify constituency trees in Rust.
☆10May 5, 2020Updated 6 years ago
malvex / sheppy
View on GitHub
A modern, fast, and easy to use task queue system for async Python
☆15Jul 13, 2026Updated last week
conda-forge / spacy-feedstock
View on GitHub
A conda-smithy repository for spacy.
☆14Apr 23, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
boehm-e / spacy-raspberry
View on GitHub
A raspberry pi 64bit image with spacy and neuralcoref pre-installed
☆21Oct 16, 2019Updated 6 years ago
sean-codes / css-settings
View on GitHub
a settings tool for changing css properties and variables
☆14Mar 6, 2018Updated 8 years ago
jekbradbury / SpaCy.jl
View on GitHub
Julia interface for SpaCy NLP library
☆14Apr 22, 2018Updated 8 years ago
justindujardin / prodigy-scratch
View on GitHub
Prodigy thing(z)
☆12Mar 22, 2018Updated 8 years ago
MarcoWorms / yfu-contracts
View on GitHub
☆10Oct 27, 2022Updated 3 years ago
ines / pretty-jekyll-skeleton
View on GitHub
Jekyll skeleton theme for a personal blog
☆12May 26, 2016Updated 10 years ago
Massive-Wiki / massive-wiki
View on GitHub
Massive Wiki - wikis made of Markdown Shared Versioned Files
☆14Jun 23, 2026Updated 3 weeks ago
tiangolo / markdown-include-variants
View on GitHub
Markdown extension to expand directives to include source example files to also include their variants. Only useful to tiangolo's projets…
☆18Jul 13, 2026Updated last week
centre-for-humanities-computing / DaCy
View on GitHub
DaCy: The State of the Art Danish NLP pipeline using SpaCy
☆104Jun 11, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ohenrik / nb_dep_ud_sm
View on GitHub
Spacy model trained based on Norwegian corpus converted from OBT to Universal dep.
☆13Jan 31, 2018Updated 8 years ago
guibressan / prosody-pod
View on GitHub
A Prosody XMPP plug and play server
☆11Apr 25, 2024Updated 2 years ago
DCGM / SoftCTC
View on GitHub
This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135
☆19Mar 7, 2023Updated 3 years ago
MantisAI / sieves
View on GitHub
Plug-and-play document AI with zero-shot models.
☆126May 11, 2026Updated 2 months ago
danieldk / citar
View on GitHub
Citar HMM part-of-speech tagger
☆15Aug 29, 2018Updated 7 years ago
explosion / confection
View on GitHub
Confection: the sweetest config system for Python
☆194Mar 27, 2026Updated 3 months ago
alexandrainst / alexandra-ml-template
View on GitHub
Template for Python-based data science projects in the Alexandra Institute.
☆12Jun 10, 2026Updated last month
jacquerie / biorxiv-cli
View on GitHub
A Python wrapper for the bioRxiv API.
☆11Aug 18, 2021Updated 4 years ago
dcpedit / kinesismod
View on GitHub
☆12Apr 12, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Wiz-IO / framework-wizio-pico
View on GitHub
framework-wizio-pico
☆13Oct 22, 2022Updated 3 years ago
BIDS-Xu-Lab / Biomedical-NLP-Benchmarks
View on GitHub
Benchmark Datasets for BioNLP Tasks
☆17May 7, 2025Updated last year
rust-ml / nlp-discussion
View on GitHub
☆15May 8, 2019Updated 7 years ago
engineervix / readme-coverage-badger
View on GitHub
automatically generates your project's coverage badge using the shields.io service, and then updates your README
☆12Updated this week
Proteusiq / hisia
View on GitHub
ML Powered Danish Sentiment Model
☆14Jun 4, 2024Updated 2 years ago
rug-compling / conllu-viewer
View on GitHub
A web-based viewer for documents in the CoNLL-U format
☆17Aug 11, 2021Updated 4 years ago
DistrictDataLabs / minke
View on GitHub
Graph extraction and NLP analysis for Baleen Corpora
☆18Sep 8, 2016Updated 9 years ago