huawei-csl / SINQLinks
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆579Updated 2 weeks ago
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below
Sorting:
- ☆712Updated last week
- Sparse Inferencing for transformer based LLMs☆215Updated 3 months ago
- ☆414Updated 3 weeks ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆569Updated 2 weeks ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆527Updated 3 weeks ago
- ☆1,226Updated 3 weeks ago
- ☆144Updated last week
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆247Updated last month
- Docs for GGUF quantization (unofficial)☆330Updated 4 months ago
- No-code CLI designed for accelerating ONNX workflows☆219Updated 5 months ago
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆751Updated 2 months ago
- Benchmark and optimize LLM inference across frameworks with ease☆141Updated 2 months ago
- ☆145Updated 4 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆260Updated last week
- InferX: Inference as a Service Platform☆142Updated this week
- ☆127Updated 3 months ago
- An open-source implementation of Whisper☆466Updated last month
- ☆533Updated this week
- ☆301Updated 4 months ago
- Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration wi…☆753Updated this week
- VLLM Port of the Chatterbox TTS model☆342Updated last month
- Enhancing LLMs with LoRA☆191Updated last month
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆167Updated 3 months ago
- Plug-and-play memory for LLMs in 3 lines of code. Add persistent, intelligent, human-like memory and recall to any model in minutes.☆217Updated 2 weeks ago
- ☆176Updated 4 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆668Updated 7 months ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,827Updated this week
- A command-line interface tool for serving LLM using vLLM.☆454Updated last week
- LLM Inference on consumer devices☆125Updated 8 months ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆808Updated 5 months ago