Lightweight piece tokenization library
☆12Apr 15, 2024Updated last year
Alternatives and similar repositories for curated-tokenizers
Users that are interested in curated-tokenizers are comparing it to the libraries listed below
Sorting:
- Wrapper for the macOS signpost API☆16Apr 24, 2023Updated 2 years ago
- Modular Rust transformer/LLM library using Candle☆38May 5, 2024Updated last year
- Central hub for demos, code snippets, and other assets for Azure Cosmos DB for AI apps.☆13Apr 9, 2025Updated 11 months ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆36Mar 27, 2024Updated last year
- CMU Linguistic Annotation Backend☆15Sep 22, 2025Updated 6 months ago
- Converts Quora's new NLU dataset to SNLI txt/jsonl format, plus test/dev split, tokenization.☆14Jan 27, 2017Updated 9 years ago
- Trying to deconstruct RWKV in understandable terms☆14May 6, 2023Updated 2 years ago
- ☆17Jan 5, 2023Updated 3 years ago
- Fine-grained sentiment annotations of NoReC☆20Aug 1, 2022Updated 3 years ago
- Plug-and-play document AI with zero-shot models.☆125Feb 16, 2026Updated last month
- Experimental plugin to add support for RSS and JSON feeds to TiddlyWiki☆10Jan 9, 2022Updated 4 years ago
- Read and modify constituency trees in Rust.☆10May 5, 2020Updated 5 years ago
- A conda-smithy repository for spacy.☆14Updated this week
- Benchmark Datasets for BioNLP Tasks☆17May 7, 2025Updated 10 months ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Oct 16, 2019Updated 6 years ago
- a settings tool for changing css properties and variables☆14Mar 6, 2018Updated 8 years ago
- KenLM extension for spaCy 2.0.☆16Dec 6, 2017Updated 8 years ago
- ☆10Oct 27, 2022Updated 3 years ago
- Julia interface for SpaCy NLP library☆14Apr 22, 2018Updated 7 years ago
- Prodigy thing(z)☆12Mar 22, 2018Updated 8 years ago
- Jekyll skeleton theme for a personal blog☆12May 26, 2016Updated 9 years ago
- Massive Wiki - wikis made of Markdown Shared Versioned Files☆14Updated this week
- A pre-commit hook for Pyrefly.☆23Mar 12, 2026Updated last week
- Kernel sources for https://huggingface.co/kernels-community☆80Updated this week
- Markdown extension to expand directives to include source example files to also include their variants. Only useful to tiangolo's projets…☆15Mar 15, 2026Updated last week
- DaCy: The State of the Art Danish NLP pipeline using SpaCy☆100Dec 26, 2024Updated last year
- Spacy model trained based on Norwegian corpus converted from OBT to Universal dep.☆13Jan 31, 2018Updated 8 years ago
- A Prosody XMPP plug and play server☆11Apr 25, 2024Updated last year
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆19Mar 7, 2023Updated 3 years ago
- ☆26Nov 18, 2025Updated 4 months ago
- CLI to manage internationalizing your Titanium app☆24Aug 13, 2025Updated 7 months ago
- Confection: the sweetest config system for Python☆193Updated this week
- Citar HMM part-of-speech tagger☆15Aug 29, 2018Updated 7 years ago
- A Python wrapper for the bioRxiv API.☆10Aug 18, 2021Updated 4 years ago
- ☆12Apr 12, 2024Updated last year
- framework-wizio-pico☆14Oct 22, 2022Updated 3 years ago
- ☆15May 8, 2019Updated 6 years ago
- automatically generates your project's coverage badge using the shields.io service, and then updates your README☆12Updated this week
- Template for Python-based data science projects in the Alexandra Institute.☆12Mar 9, 2026Updated last week