Datamodels for hugging face tokenizers
β107May 26, 2026Updated 3 weeks ago
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β45Feb 11, 2026Updated 4 months ago
- π’ Work with static vector modelsβ39Apr 21, 2025Updated last year
- Code for SaGe subword tokenizer (EACL 2023)β28Nov 30, 2024Updated last year
- PANiC - PAraphrasing Noun-Compoundsβ15Apr 6, 2018Updated 8 years ago
- Nearly Inference Free Embeddings: make your RAG queries 500x fasterβ77Apr 27, 2026Updated last month
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- ANE accelerated embedding models!β19Dec 11, 2024Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.β27Nov 25, 2024Updated last year
- Hugging Face Jobsβ20Jul 11, 2025Updated 11 months ago
- π€ Trade any tensors over the networkβ31Sep 27, 2023Updated 2 years ago
- β20Oct 5, 2025Updated 8 months ago
- A tiny BERT for low-resource monolingual modelsβ32Dec 24, 2025Updated 5 months ago
- Small python package to measure OCR quality and other related metrics.β27Feb 19, 2024Updated 2 years ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"β29Jun 2, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- decontaminationβ33Mar 4, 2026Updated 3 months ago
- Load embeddings and featurize your sentences.β31Oct 23, 2024Updated last year
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.β31May 11, 2020Updated 6 years ago
- Experimental Marimo extension for Agentic Notebooks -- integrating AI Agents into the Notebook workflowβ15Oct 11, 2025Updated 8 months ago
- Demo server for TREC LiveQA competitionβ11Dec 7, 2016Updated 9 years ago
- A utility for async batch jobs in marimoβ13Mar 12, 2025Updated last year
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"β13Nov 26, 2024Updated last year
- Generalist and Lightweight Model for Text Classificationβ218Jun 2, 2026Updated 2 weeks ago
- Code for the ILNewsDiff Twitter accountβ10May 23, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paperβ14Aug 9, 2021Updated 4 years ago
- Random Forest-based "Correlation" measuresβ15May 3, 2022Updated 4 years ago
- Import hook for maturinβ19Dec 23, 2025Updated 5 months ago
- Just some FastHTML demos for safekeepsβ13Dec 10, 2024Updated last year
- β12Mar 17, 2026Updated 2 months ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.β13Jan 5, 2023Updated 3 years ago
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.β19Dec 11, 2025Updated 6 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.β89Feb 10, 2026Updated 4 months ago