Datamodels for hugging face tokenizers
☆106Apr 12, 2026Updated this week
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆44Feb 11, 2026Updated 2 months ago
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 8 years ago
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆74Feb 20, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- ANE accelerated embedding models!☆20Dec 11, 2024Updated last year
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Apr 11, 2020Updated 6 years ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- Hugging Face Jobs☆19Jul 11, 2025Updated 9 months ago
- 🤝 Trade any tensors over the network☆31Sep 27, 2023Updated 2 years ago
- ☆20Oct 5, 2025Updated 6 months ago
- Pre-train Static Word Embeddings☆98Mar 27, 2026Updated 2 weeks ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Feb 29, 2024Updated 2 years ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"☆29Jun 2, 2021Updated 4 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- decontamination☆30Mar 4, 2026Updated last month
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31May 11, 2020Updated 5 years ago
- Python client SDK for Ultravox.☆16Dec 10, 2025Updated 4 months ago
- Experimental Marimo extension for Agentic Notebooks -- integrating AI Agents into the Notebook workflow☆14Oct 11, 2025Updated 6 months ago
- A utility for async batch jobs in marimo☆13Mar 12, 2025Updated last year
- Generalist and Lightweight Model for Text Classification☆208Feb 17, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"☆13Nov 26, 2024Updated last year
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 2 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18May 10, 2023Updated 2 years ago
- MSPaint for marimo and other Python notebooks☆24Oct 24, 2025Updated 5 months ago
- Just some FastHTML demos for safekeeps☆13Dec 10, 2024Updated last year
- ☆12Mar 17, 2026Updated 3 weeks ago
- Local LLM as a search relevance judge☆28Mar 2, 2025Updated last year
- Bayesian probability transforms for BM25 retrieval scores☆72Mar 28, 2026Updated 2 weeks ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆84Feb 10, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆57Dec 27, 2025Updated 3 months ago
- 3D geoms for plotnine (grammar of graphics in Python)☆13Aug 5, 2022Updated 3 years ago
- In this project, we propose to study Vision Transformers trained using the Barlow Twins self-supervised method, and compare the results w…☆16Oct 3, 2023Updated 2 years ago
- ☆86Nov 21, 2025Updated 4 months ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Command Line Interface for Hugging Face Inference Endpoints☆65Apr 10, 2024Updated 2 years ago
- ☆109Jun 2, 2025Updated 10 months ago