NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆197Updated 2 weeks ago
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- ☆134Updated 3 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated last month
- LLM Inference on consumer devices☆124Updated 5 months ago
- InferX is a Inference Function as a Service Platform☆128Updated this week
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆145Updated 2 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆100Updated last month
- ☆162Updated 2 weeks ago
- AI management tool☆118Updated 9 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆73Updated this week
- A pipeline parallel training script for LLMs.☆153Updated 3 months ago
- automatically quant GGUF models☆195Updated last week
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆80Updated 11 months ago
- ☆95Updated this week
- Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆98Updated 3 weeks ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆83Updated 3 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆92Updated 3 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆26Updated 3 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated 6 months ago
- ☆155Updated 4 months ago
- Lightweight Inference server for OpenVINO☆198Updated this week
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆518Updated 2 weeks ago
- VLLM Port of the Chatterbox TTS model☆277Updated last week
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆271Updated this week
- Enhancing LLMs with LoRA☆50Updated 2 weeks ago
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated last year
- ☆30Updated 5 months ago
- ☆158Updated last week
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆27Updated 2 weeks ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆92Updated last month
- Dia-JAX: A JAX port of Dia, the text-to-speech model for generating realistic dialogue from text with emotion and tone control.☆27Updated 3 months ago