stephantul / skeletokenView external linksLinks
Datamodels for hugging face tokenizers
β99Updated this week
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below
Sorting:
- β43Jan 13, 2026Updated last month
- π€ Trade any tensors over the networkβ31Sep 27, 2023Updated 2 years ago
- π’ Work with static vector modelsβ36Apr 21, 2025Updated 9 months ago
- A tiny BERT for low-resource monolingual modelsβ31Dec 24, 2025Updated last month
- β20Oct 5, 2025Updated 4 months ago
- ANE accelerated embedding models!β20Dec 11, 2024Updated last year
- Chunk Dedupe Estimationβ20Nov 5, 2024Updated last year
- Just some FastHTML demos for safekeepsβ13Dec 10, 2024Updated last year
- Python client SDK for Ultravox.β16Dec 10, 2025Updated 2 months ago
- Smaller and faster nanochat in MLXβ36Nov 15, 2025Updated 2 months ago
- Legalpioneer datasetβ15Apr 10, 2025Updated 10 months ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paperβ14Aug 9, 2021Updated 4 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)β18May 10, 2023Updated 2 years ago
- Pre-train Static Word Embeddingsβ94Sep 9, 2025Updated 5 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ72Feb 29, 2024Updated last year
- β106Jun 2, 2025Updated 8 months ago
- A small rust-based data loaderβ36Nov 14, 2025Updated 2 months ago
- A Python library aimed at dissecting and augmenting NER training data.β60May 11, 2023Updated 2 years ago
- Difference-based Contrastive Learning for Korean Sentence Embeddingsβ23Updated this week
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrievalβ38Aug 4, 2025Updated 6 months ago
- Nearly Inference Free Embeddings: make your RAG queries 500x fasterβ70Updated this week
- CERberus -- guardian against character errorsβ29Feb 15, 2024Updated last year
- β63Dec 29, 2025Updated last month
- Small python package to measure OCR quality and other related metrics.β27Feb 19, 2024Updated last year
- Trully flash implementation of DeBERTa disentangled attention mechanism.β76Jan 26, 2026Updated 2 weeks ago
- ππ€ A collection of templates for Hugging Face Spacesβ35Oct 9, 2023Updated 2 years ago
- Efficiently find the best-suited language model (LM) for your NLP taskβ134Jul 26, 2025Updated 6 months ago
- Dashboard v5 Coming Soon!!β63Jan 2, 2026Updated last month
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisperβ31Jul 28, 2024Updated last year
- [Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluationβ28Dec 9, 2022Updated 3 years ago
- The code and data for our paper (EMNLP 2023 findings) "Type-Aware Decomposed Framework for Few-Shot Named Entity Recognition".β35Jul 17, 2025Updated 6 months ago
- User-friendly viewer for Parquet filesβ10Jan 10, 2026Updated last month
- β14Dec 5, 2025Updated 2 months ago
- TimeSeries Java client for Facebook Beringei. It also includes query service with tags support for metrics.β10May 13, 2017Updated 8 years ago
- MetaLearners for CATE estimationβ48Feb 3, 2026Updated last week
- Python test runner built in Rustβ17Updated this week
- Train LLM on Hugging Face infraβ67Nov 13, 2025Updated 3 months ago
- A complete pipeline for fine-tuning YOLOv8 pose models with custom datasets. Supports automatic and semi-automatic annotation for efficieβ¦β15Feb 9, 2025Updated last year