NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆217Updated 5 months ago
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- LLM Inference on consumer devices☆128Updated 10 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated 2 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆203Updated last month
- ☆69Updated 6 months ago
- InferX: Inference as a Service Platform☆146Updated this week
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆106Updated 7 months ago
- ☆178Updated 5 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆560Updated last month
- ☆109Updated 6 months ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆589Updated last month
- ☆62Updated 6 months ago
- ☆431Updated last month
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆154Updated 6 months ago
- ☆134Updated 8 months ago
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆108Updated 2 months ago
- ☆158Updated 9 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆29Updated last month
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆587Updated 3 weeks ago
- Enhancing LLMs with LoRA☆205Updated 2 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆96Updated 8 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated 11 months ago
- ☆104Updated 2 months ago
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆356Updated last week
- AI management tool☆119Updated last year
- ☆243Updated 3 months ago
- Efficient non-uniform quantization with GPTQ for GGUF☆57Updated 4 months ago
- ☆63Updated 8 months ago
- ☆34Updated 9 months ago
- Measuring Thinking Efficiency in Reasoning Models - Research Repository☆38Updated last month
- automatically quant GGUF models☆220Updated 3 weeks ago