NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆218Updated 5 months ago
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- LLM Inference on consumer devices☆129Updated 10 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆49Updated 3 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆222Updated last month
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆110Updated 8 months ago
- InferX: Inference as a Service Platform☆154Updated this week
- ☆71Updated 7 months ago
- ☆62Updated 6 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 9 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆29Updated last month
- ☆178Updated 5 months ago
- ☆135Updated 9 months ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆600Updated 2 months ago
- Enhancing LLMs with LoRA☆206Updated 3 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆148Updated 3 months ago
- ☆159Updated 9 months ago
- ☆163Updated 7 months ago
- Efficient non-uniform quantization with GPTQ for GGUF☆58Updated 4 months ago
- ☆109Updated 5 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆590Updated 3 weeks ago
- ☆113Updated 2 months ago
- AI management tool☆119Updated last year
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated last year
- ☆64Updated 8 months ago
- ☆440Updated 2 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆82Updated 5 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆156Updated 7 months ago
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 8 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Updated 7 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆569Updated 2 months ago