huawei-csl / SINQLinks
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆585Updated last week
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below
Sorting:
- ☆717Updated last month
- Sparse Inferencing for transformer based LLMs☆215Updated 4 months ago
- ☆1,245Updated last month
- ☆426Updated 3 weeks ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆553Updated last month
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆580Updated last month
- ☆659Updated this week
- ☆301Updated 4 months ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆809Updated 5 months ago
- Docs for GGUF quantization (unofficial)☆340Updated 5 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆277Updated last month
- InferX: Inference as a Service Platform☆143Updated this week
- Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.☆778Updated this week
- A command-line interface tool for serving LLM using vLLM.☆456Updated 3 weeks ago
- Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, and full end-to-end reference exampl…☆246Updated last week
- Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https…☆1,920Updated this week
- The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"☆304Updated this week
- An open-source implementation of Whisper☆469Updated 2 months ago
- ☆153Updated 3 weeks ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆871Updated last week
- Code for Bolmo: Byteifying the Next Generation of Language Models☆109Updated last week
- ☆853Updated 3 months ago
- A framework for efficient model inference with omni-modality models☆1,861Updated this week
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆167Updated 3 weeks ago
- ☆127Updated 3 months ago
- No-code CLI designed for accelerating ONNX workflows☆222Updated 6 months ago
- VLLM Port of the Chatterbox TTS model☆354Updated 2 months ago
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆270Updated 2 months ago
- open-source coding LLM for software engineering tasks☆1,083Updated 3 months ago
- Benchmark and optimize LLM inference across frameworks with ease☆151Updated 3 months ago