huawei-csl / SINQLinks
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆541Updated last week
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below
Sorting:
- ☆695Updated 2 weeks ago
- Sparse Inferencing for transformer based LLMs☆201Updated 2 months ago
- ☆445Updated this week
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆554Updated 2 months ago
- Docs for GGUF quantization (unofficial)☆293Updated 3 months ago
- Enhancing LLMs with LoRA☆172Updated last week
- InferX: Inference as a Service Platform☆137Updated this week
- ☆144Updated 2 months ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,512Updated this week
- It takes a village to raise a child: Google DeepThink 🧠 but in LangGraph and free - an original algorithm for collaborative agents using…☆128Updated last month
- llama.cpp fork with additional SOTA quants and improved performance☆1,277Updated this week
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆226Updated this week
- No-code CLI designed for accelerating ONNX workflows☆215Updated 4 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆43Updated this week
- ☆300Updated 2 months ago
- ☆93Updated 3 weeks ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆800Updated 3 months ago
- Benchmark and optimize LLM inference across frameworks with ease☆124Updated last month
- automatically quant GGUF models☆214Updated last week
- ☆168Updated 2 months ago
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆682Updated last month
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆195Updated this week
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆479Updated last week
- A command-line interface tool for serving LLM using vLLM.☆433Updated last week
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆656Updated 3 weeks ago
- MiniMax-M2, a Mini model built for Max coding & agentic workflows.☆656Updated this week
- A platform to self-host AI on easy mode☆171Updated last week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆140Updated 3 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆541Updated last week
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆161Updated 2 months ago