NimbleEdge / sparse_transformersLinks

Sparse Inferencing for transformer based LLMs

☆196

Alternatives and similar repositories for sparse_transformers

Users that are interested in sparse_transformers are comparing it to the libraries listed below

Sorting:

inferx-net / inferx
InferX is a Inference Function as a Service Platform
☆119Updated last week
LeanModels / DFloat11
DFloat11: Lossless LLM Compression for Efficient GPU Inference
☆464Updated this week
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆39Updated 2 weeks ago
Infini-AI-Lab / UMbreLLa
LLM Inference on consumer devices
☆123Updated 4 months ago
jd-3d / SOLOBench
☆132Updated 3 months ago
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆97Updated last month
TesslateAI / TFrameX
☆152Updated last week
snu-mllab / KVzip
Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆93Updated last week
TheProxyCompany / proxy-structuring-engine
Guaranteed Structured Output from any Language Model via Hierarchical State Machines
☆141Updated 2 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆67Updated last month
av / klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
☆78Updated 10 months ago
TC-Zheng / ActuosusAI
AI management tool
☆118Updated 8 months ago
ScalingIntelligence / tokasaurus
☆388Updated this week
fidecastro / llama-cpp-connector
Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!
☆25Updated 2 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated 2 months ago
chigkim / Ollama-MMLU-Pro
☆94Updated 7 months ago
iluxu / llmbasedos
Minimal Linux OS with a Model Context Protocol (MCP) gateway to expose local capabilities to LLMs.
☆259Updated last month
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆153Updated 3 months ago
SearchSavior / OpenArc
Lightweight Inference server for OpenVINO
☆191Updated last week
menloresearch / ReZero
☆155Updated 3 months ago
nath1295 / MLX-Textgen
A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
☆90Updated last month
reka-ai / rekaquant
☆58Updated 3 weeks ago
Cornell-RelaxML / yaqa-quantization
☆42Updated last month
Lizonghang / TPI-LLM
TPI-LLM: Serving 70b-scale LLMs Efficiently on Low-resource Edge Devices
☆186Updated 2 months ago
bold84 / cot_proxy
Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…
☆49Updated 2 months ago
Viceman256 / TensorTune
KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning
☆26Updated 2 months ago
deepgrove-ai / Bonsai
☆27Updated 4 months ago
mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆94Updated last year
severian42 / Computational-Model-for-Symbolic-Representations
Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …
☆50Updated 5 months ago
beyondExp / B-Llama3-o
B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.
☆26Updated last year