huawei-csl / SINQLinks

Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.

☆570

Alternatives and similar repositories for SINQ

Users that are interested in SINQ are comparing it to the libraries listed below

Sorting:

NVlabs / Jet-Nemotron
☆702Updated last month
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆208Updated 3 months ago
LeanModels / DFloat11
DFloat11: Lossless LLM Compression for Efficient GPU Inference
☆560Updated 2 months ago
iuliaturc / gguf-docs
Docs for GGUF quantization (unofficial)
☆312Updated 4 months ago
leoheuler / flashtensors
☆365Updated this week
google-ai-edge / LiteRT-LM
☆477Updated this week
WeiboAI / VibeThinker
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
☆155Updated last week
MoonshotAI / Kimi-Linear
☆1,161Updated 2 weeks ago
inferx-net / inferx
InferX: Inference as a Service Platform
☆138Updated this week
lemonade-sdk / lemonade
Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…
☆1,622Updated this week
codelion / ellora
Enhancing LLMs with LoRA
☆176Updated 3 weeks ago
MetaStone-AI / XBai-o4
☆300Updated 3 months ago
deepreinforce-ai / CUDA-L1
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
☆238Updated 2 weeks ago
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆216Updated 5 months ago
Infini-AI-Lab / UMbreLLa
LLM Inference on consumer devices
☆125Updated 8 months ago
intel / auto-round
Advanced quantization toolkit for LLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with Transform…
☆712Updated this week
SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆241Updated last week
MBZUAI-IFM / K2-Think-SFT
☆126Updated 2 months ago
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆666Updated 6 months ago
MoonshotAI / checkpoint-engine
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆829Updated this week
ibm-granite / granite-4.0-language-models
☆132Updated last month
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆1,329Updated this week
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆91Updated 5 months ago
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆44Updated 3 weeks ago
microsoft / wina
☆144Updated 3 months ago
Buddie-AI / Buddie
BUDDIE is the first full-stack open-source AI voice interaction solution, providing a complete end-to-end system from hardware design to …
☆160Updated 3 months ago
randombk / chatterbox-vllm
VLLM Port of the Chatterbox TTS model
☆333Updated last month
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆145Updated 4 months ago
TesslateAI / TFrameX
☆173Updated 3 months ago
HazyResearch / minions
Big & Small LLMs working together
☆1,200Updated this week