NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆215Updated 4 months ago
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated last month
- LLM Inference on consumer devices☆128Updated 9 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆151Updated 2 weeks ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆104Updated 7 months ago
- ☆66Updated 6 months ago
- ☆62Updated 5 months ago
- InferX: Inference as a Service Platform☆143Updated this week
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆576Updated last month
- ☆176Updated 4 months ago
- Enhancing LLMs with LoRA☆197Updated 2 months ago
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆107Updated last month
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆172Updated 3 weeks ago
- ☆159Updated 6 months ago
- ☆109Updated 4 months ago
- 1.58-bit LLaMa model☆83Updated last year
- ☆426Updated 3 weeks ago
- ☆135Updated 7 months ago
- Samples of good AI generated CUDA kernels☆95Updated 6 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆585Updated this week
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 7 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆553Updated last month
- ☆34Updated 9 months ago
- RWKV-7: Surpassing GPT☆101Updated last year
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated 10 months ago
- ☆63Updated 7 months ago
- Efficient non-uniform quantization with GPTQ for GGUF☆57Updated 3 months ago
- automatically quant GGUF models☆219Updated 2 months ago
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆81Updated 3 months ago
- ☆159Updated 8 months ago
- NVIDIA Linux open GPU with P2P support☆98Updated 3 weeks ago