explosion/tokenizations

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/explosion/tokenizations)

explosion / tokenizations

Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/

☆195

Alternatives and similar repositories for tokenizations

Users that are interested in tokenizations are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tamuhey / tokenizations
View on GitHub
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
☆30Jul 12, 2021Updated 5 years ago
explosion / spacy-alignments
View on GitHub
💫 A spaCy package for Yohei Tamura's Rust tokenizations library
☆35Mar 27, 2026Updated 3 months ago
megagonlabs / ginza-transformers
View on GitHub
Use custom tokenizers in spacy-transformers
☆16Aug 9, 2022Updated 3 years ago
tamuhey / textspan
View on GitHub
Text span utilities for Rust and Python
☆23Jan 3, 2023Updated 3 years ago
explosion / spacy-huggingface-pipelines
View on GitHub
💥 Use Hugging Face text and token classification pipelines directly in spaCy
☆65Mar 18, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
megagonlabs / UD_Japanese-GSD
View on GitHub
Japanese data from the Google UDT 2.0.
☆28Mar 24, 2023Updated 3 years ago
explosion / spacy-transformers
View on GitHub
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
☆1,408Mar 27, 2026Updated 3 months ago
msg-systems / coreferee
View on GitHub
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…
☆198Dec 18, 2022Updated 3 years ago
explosion / floret
View on GitHub
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
☆343Apr 25, 2025Updated last year
tomaarsen / module_dependencies
View on GitHub
Gather module dependencies of source code
☆13Jul 21, 2023Updated 2 years ago
KoichiYasuoka / GuwenCOMBO
View on GitHub
Tokenizer POS-tagger and Dependency-parser for Classical Chinese
☆15Dec 30, 2025Updated 6 months ago
ITUnlp / UniParse
View on GitHub
UniParse: A universal graph-based parsing toolkit
☆11Oct 2, 2019Updated 6 years ago
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
explosion / srsly
View on GitHub
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)
☆484Mar 27, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
PKSHATechnology-Research / camphr
View on GitHub
Camphr - NLP libary for creating pipeline components
☆336Dec 9, 2022Updated 3 years ago
NISTEP / minutes
View on GitHub
議事録メタデータセット
☆12Jun 10, 2018Updated 8 years ago
koaning / tokenwiser
View on GitHub
Bag of, not words, but tricks!
☆68Jun 11, 2026Updated last month
explosion / jupyterlab-prodigy
View on GitHub
🧬 A JupyterLab extension for annotating data with Prodigy
☆190May 10, 2023Updated 3 years ago
explosion / spacy-llm
View on GitHub
🦙 Integrating LLMs into structured NLP pipelines
☆1,392Mar 27, 2026Updated 3 months ago
copenlu / scientific-information-change
View on GitHub
Code for the paper "Modeling Information Change in Science Communication with Semantically Matched Paraphrases" from EMNLP 2022
☆13Oct 20, 2022Updated 3 years ago
masayu-a / NAIST-JENE
View on GitHub
☆10Aug 13, 2012Updated 13 years ago
facebookresearch / CMR
View on GitHub
N/A
☆19Aug 15, 2022Updated 3 years ago
UKPLab / tmlr2026-manifold-analysis
View on GitHub
☆21Mar 3, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / spacy-ann-linker
View on GitHub
spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking
☆86Oct 6, 2022Updated 3 years ago
SapienzaNLP / wsd-hard-benchmark
View on GitHub
Data and code for "Nibbling at the Hard Core of Word Sense Disambiguation" (ACL 2022).
☆16Mar 25, 2022Updated 4 years ago
paulrinckens / timexy
View on GitHub
A spaCy custom component that extracts and normalizes temporal expressions
☆56Feb 13, 2023Updated 3 years ago
BatsResearch / wiser
View on GitHub
Framework for weakly supervised deep sequence taggers, focused on named entity recognition
☆76Feb 10, 2023Updated 3 years ago
exped1230 / S2-VER
View on GitHub
The official implement of paper S2-VER: Semi-Supervised Visual Emotion Recognition
☆11Apr 28, 2024Updated 2 years ago
ellenmellon / GraphSum
View on GitHub
Extracting Summary Knowledge Graphs from Long Documents
☆19Jul 2, 2021Updated 5 years ago
explosion / weasel
View on GitHub
🦦 weasel: A small and easy workflow system
☆93Mar 27, 2026Updated 3 months ago
KennethEnevoldsen / augmenty
View on GitHub
Augmenty is an augmentation library based on spaCy for augmenting texts.
☆156May 24, 2024Updated 2 years ago
alinear-corp / albert-japanese
View on GitHub
BERT with SentencePiece for Japanese text.
☆33Oct 28, 2021Updated 4 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
tony-hong / roleo
View on GitHub
Web based semantic visualization tool
☆12Feb 16, 2017Updated 9 years ago
nlp-waseda / Kanbun-LM
View on GitHub
Code for paper "Kanbun-LM: Reading and Translating Classical Chinese in Japanese Method by Language Models"
☆21Jul 10, 2023Updated 3 years ago
explosion / spacy-stanza
View on GitHub
💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
☆747Aug 15, 2024Updated last year
jzbjyb / X-FACTR
View on GitHub
☆24Jun 12, 2023Updated 3 years ago
cdpierse / transformers-interpret
View on GitHub
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
☆1,416Aug 30, 2023Updated 2 years ago
doccano / doccano-client
View on GitHub
A simple client for doccano API.
☆86May 25, 2024Updated 2 years ago
explosion / spacy-experimental
View on GitHub
🧪 Cutting-edge experimental spaCy components and features
☆104Apr 23, 2024Updated 2 years ago