Datamodels for hugging face tokenizers
β104Mar 12, 2026Updated 2 weeks ago
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β44Feb 11, 2026Updated last month
- π’ Work with static vector modelsβ38Apr 21, 2025Updated 11 months ago
- Label shift estimation for transfer difficulty with Familiarity.β10Feb 4, 2025Updated last year
- Code for SaGe subword tokenizer (EACL 2023)β28Nov 30, 2024Updated last year
- Nearly Inference Free Embeddings: make your RAG queries 500x fasterβ70Feb 20, 2026Updated last month
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- ANE accelerated embedding models!β20Dec 11, 2024Updated last year
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)β22Apr 11, 2020Updated 5 years ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.β27Nov 25, 2024Updated last year
- Hugging Face Jobsβ19Jul 11, 2025Updated 8 months ago
- π€ Trade any tensors over the networkβ31Sep 27, 2023Updated 2 years ago
- decontaminationβ27Mar 4, 2026Updated 3 weeks ago
- Pre-train Static Word Embeddingsβ95Updated this week
- A tiny BERT for low-resource monolingual modelsβ31Dec 24, 2025Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" givenβ¦β15Oct 16, 2023Updated 2 years ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ72Feb 29, 2024Updated 2 years ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"β29Jun 2, 2021Updated 4 years ago
- Small python package to measure OCR quality and other related metrics.β27Feb 19, 2024Updated 2 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.β31May 11, 2020Updated 5 years ago
- Demo server for TREC LiveQA competitionβ11Dec 7, 2016Updated 9 years ago
- A utility for async batch jobs in marimoβ13Mar 12, 2025Updated last year
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"β13Nov 26, 2024Updated last year
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paperβ14Aug 9, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)β18May 10, 2023Updated 2 years ago
- MSPaint for marimo and other Python notebooksβ24Oct 24, 2025Updated 5 months ago
- Import hook for maturinβ18Dec 23, 2025Updated 3 months ago
- Just some FastHTML demos for safekeepsβ13Dec 10, 2024Updated last year
- β12Mar 17, 2026Updated last week
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.β13Jan 5, 2023Updated 3 years ago
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.β19Dec 11, 2025Updated 3 months ago
- Bayesian probability transforms for BM25 retrieval scoresβ62Updated this week
- Trully flash implementation of DeBERTa disentangled attention mechanism.β83Feb 10, 2026Updated last month
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- β57Dec 27, 2025Updated 3 months ago
- Adds gamepad support to marimo/Pythonβ40Jun 28, 2025Updated 8 months ago
- β84Nov 21, 2025Updated 4 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- Command Line Interface for Hugging Face Inference Endpointsβ65Apr 10, 2024Updated last year
- β108Jun 2, 2025Updated 9 months ago
- Statistics on multilingual datasetsβ17Jul 12, 2022Updated 3 years ago