NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆215Updated 3 months ago
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- LLM Inference on consumer devices☆125Updated 8 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated last month
- ☆135Updated 7 months ago
- ☆64Updated 5 months ago
- AI management tool☆121Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆100Updated 6 months ago
- ☆62Updated 4 months ago
- InferX: Inference as a Service Platform☆141Updated this week
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆129Updated 3 weeks ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆99Updated 5 months ago
- ☆414Updated 3 weeks ago
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆104Updated last month
- ☆176Updated 3 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆569Updated last week
- Enhancing LLMs with LoRA☆177Updated last month
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆46Updated 3 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆527Updated 2 weeks ago
- ☆61Updated 6 months ago
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆155Updated last week
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 7 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated 10 months ago
- ☆108Updated 3 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆84Updated this week
- A platform to self-host AI on easy mode☆178Updated this week
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆29Updated last week
- ☆158Updated 7 months ago
- 1.58-bit LLaMa model☆83Updated last year
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆146Updated 5 months ago
- Easily view and modify JSON datasets for large language models☆84Updated 6 months ago
- Samples of good AI generated CUDA kernels☆92Updated 6 months ago