huawei-csl / SINQLinks
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆587Updated 3 weeks ago
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below
Sorting:
- ☆720Updated last month
- Sparse Inferencing for transformer based LLMs☆218Updated 5 months ago
- WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups ov…☆572Updated this week
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆560Updated 2 months ago
- ☆728Updated this week
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆592Updated last month
- ☆430Updated last month
- ☆302Updated 5 months ago
- ☆159Updated last month
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆203Updated last month
- ☆1,268Updated 2 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆283Updated 2 months ago
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆795Updated 3 weeks ago
- ☆148Updated 5 months ago
- Docs for GGUF quantization (unofficial)☆348Updated 6 months ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆811Updated 6 months ago
- ☆129Updated 4 months ago
- The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"☆314Updated this week
- VLLM Port of the Chatterbox TTS model☆361Updated 3 months ago
- Enhancing LLMs with LoRA☆205Updated 2 months ago
- ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.☆573Updated 3 weeks ago
- An open-source implementation of Whisper☆475Updated 2 months ago
- Plug-and-play memory for LLMs in 3 lines of code. Add persistent, intelligent, human-like memory and recall to any model in minutes.☆244Updated last month
- BUDDIE is the first full-stack open-source AI voice interaction solution, providing a complete end-to-end system from hardware design to …☆236Updated 5 months ago
- No-code CLI designed for accelerating ONNX workflows☆224Updated 7 months ago
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆356Updated 2 weeks ago
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆228Updated 5 months ago
- Building blocks for agents in C++☆131Updated last week
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆473Updated last month
- InferX: Inference as a Service Platform☆146Updated this week