Datamodels for hugging face tokenizers
☆107Apr 28, 2026Updated 3 weeks ago
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🔢 Work with static vector models☆39Apr 21, 2025Updated last year
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 8 years ago
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆77Apr 27, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- ANE accelerated embedding models!☆19Dec 11, 2024Updated last year
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Apr 11, 2020Updated 6 years ago
- Hugging Face Jobs☆20Jul 11, 2025Updated 10 months ago
- 🤝 Trade any tensors over the network☆31Sep 27, 2023Updated 2 years ago
- Legalpioneer dataset☆15Apr 10, 2025Updated last year
- Pre-train Static Word Embeddings☆102May 4, 2026Updated 3 weeks ago
- A tiny BERT for low-resource monolingual models☆32Dec 24, 2025Updated 5 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Feb 29, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Small python package to measure OCR quality and other related metrics.☆26Feb 19, 2024Updated 2 years ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"☆29Jun 2, 2021Updated 4 years ago
- decontamination☆33Mar 4, 2026Updated 2 months ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31May 11, 2020Updated 6 years ago
- Python client SDK for Ultravox.☆16Dec 10, 2025Updated 5 months ago
- Experimental Marimo extension for Agentic Notebooks -- integrating AI Agents into the Notebook workflow☆15Oct 11, 2025Updated 7 months ago
- Demo server for TREC LiveQA competition☆11Dec 7, 2016Updated 9 years ago
- ☆68Jan 28, 2026Updated 3 months ago
- A utility for async batch jobs in marimo☆13Mar 12, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"☆13Nov 26, 2024Updated last year
- ☆14Jul 10, 2021Updated 4 years ago
- Generalist and Lightweight Model for Text Classification☆217May 19, 2026Updated last week
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18May 10, 2023Updated 3 years ago
- MSPaint for marimo and other Python notebooks☆25Oct 24, 2025Updated 7 months ago
- Import hook for maturin☆18Dec 23, 2025Updated 5 months ago
- Just some FastHTML demos for safekeeps☆13Dec 10, 2024Updated last year
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.☆19Dec 11, 2025Updated 5 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆87Feb 10, 2026Updated 3 months ago
- ☆57Dec 27, 2025Updated 5 months ago
- Bayesian probability transforms for BM25 retrieval scores☆75Mar 28, 2026Updated last month
- Complex Systems 530 - Computer Modeling of Complex Systems (Winter 2016)☆15Apr 15, 2016Updated 10 years ago
- Chunk Dedupe Estimation☆20Nov 5, 2024Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Command Line Interface for Hugging Face Inference Endpoints☆65Apr 10, 2024Updated 2 years ago