huawei-csl / SINQLinks
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆570Updated 2 weeks ago
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below
Sorting:
- ☆702Updated last month
- Sparse Inferencing for transformer based LLMs☆208Updated 3 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆560Updated 2 months ago
- Docs for GGUF quantization (unofficial)☆312Updated 4 months ago
- ☆365Updated this week
- ☆477Updated this week
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆155Updated last week
- ☆1,161Updated 2 weeks ago
- InferX: Inference as a Service Platform☆138Updated this week
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,622Updated this week
- Enhancing LLMs with LoRA☆176Updated 3 weeks ago
- ☆300Updated 3 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆238Updated 2 weeks ago
- No-code CLI designed for accelerating ONNX workflows☆216Updated 5 months ago
- LLM Inference on consumer devices☆125Updated 8 months ago
- Advanced quantization toolkit for LLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with Transform…☆712Updated this week
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆241Updated last week
- ☆126Updated 2 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆666Updated 6 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆829Updated this week
- ☆132Updated last month
- llama.cpp fork with additional SOTA quants and improved performance☆1,329Updated this week
- Samples of good AI generated CUDA kernels☆91Updated 5 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆44Updated 3 weeks ago
- ☆144Updated 3 months ago
- BUDDIE is the first full-stack open-source AI voice interaction solution, providing a complete end-to-end system from hardware design to …☆160Updated 3 months ago
- VLLM Port of the Chatterbox TTS model☆333Updated last month
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆145Updated 4 months ago
- ☆173Updated 3 months ago
- Big & Small LLMs working together☆1,200Updated this week