Datamodels for hugging face tokenizers
β106Apr 28, 2026Updated last week
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β45Feb 11, 2026Updated 2 months ago
- π’ Work with static vector modelsβ39Apr 21, 2025Updated last year
- Label shift estimation for transfer difficulty with Familiarity.β10Feb 4, 2025Updated last year
- Code for SaGe subword tokenizer (EACL 2023)β28Nov 30, 2024Updated last year
- Nearly Inference Free Embeddings: make your RAG queries 500x fasterβ77Apr 27, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- ANE accelerated embedding models!β20Dec 11, 2024Updated last year
- Hugging Face Jobsβ20Jul 11, 2025Updated 9 months ago
- π€ Trade any tensors over the networkβ31Sep 27, 2023Updated 2 years ago
- Legalpioneer datasetβ15Apr 10, 2025Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" givenβ¦β15Oct 16, 2023Updated 2 years ago
- A tiny BERT for low-resource monolingual modelsβ31Dec 24, 2025Updated 4 months ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"β29Jun 2, 2021Updated 4 years ago
- Small python package to measure OCR quality and other related metrics.β27Feb 19, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- decontaminationβ30Mar 4, 2026Updated 2 months ago
- Load embeddings and featurize your sentences.β31Oct 23, 2024Updated last year
- Python client SDK for Ultravox.β16Dec 10, 2025Updated 4 months ago
- Demo server for TREC LiveQA competitionβ11Dec 7, 2016Updated 9 years ago
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"β13Nov 26, 2024Updated last year
- Generalist and Lightweight Model for Text Classificationβ211Updated this week
- Code for the ILNewsDiff Twitter accountβ10May 23, 2023Updated 2 years ago
- MSPaint for marimo and other Python notebooksβ25Oct 24, 2025Updated 6 months ago
- Just some FastHTML demos for safekeepsβ13Dec 10, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β12Mar 17, 2026Updated last month
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.β13Jan 5, 2023Updated 3 years ago
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.β19Dec 11, 2025Updated 4 months ago
- Local LLM as a search relevance judgeβ29Mar 2, 2025Updated last year
- 153 field-tested techniques for Claude Code β patterns, architectures, and workflows for developers and AI-native teamsβ84Apr 19, 2026Updated 2 weeks ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.β87Feb 10, 2026Updated 2 months ago
- Bayesian probability transforms for BM25 retrieval scoresβ75Mar 28, 2026Updated last month
- Chunk Dedupe Estimationβ20Nov 5, 2024Updated last year
- β86Nov 21, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- β26Jan 7, 2023Updated 3 years ago
- β111Jun 2, 2025Updated 11 months ago
- A Python library aimed at dissecting and augmenting NER training data.β61May 11, 2023Updated 2 years ago
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NERβ22Jul 19, 2023Updated 2 years ago
- Generate fixed dimensional embeddings for multi-dimensional vectors in python based on Muvera from Google.β20Jun 28, 2025Updated 10 months ago
- β18Feb 4, 2025Updated last year