huawei-csl / SINQLinks
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆20Updated this week
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below
Sorting:
- ☆145Updated 2 months ago
- Sparse Inferencing for transformer based LLMs☆201Updated last month
- ☆674Updated this week
- Enhancing LLMs with LoRA☆159Updated 3 weeks ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS over OpenAI endpoints.☆211Updated this week
- Docs for GGUF quantization (unofficial)☆275Updated 2 months ago
- It takes a village to raise a child: Google DeepThink 🧠 but in LangGraph and free - an original algorithm for collaborative agents using…☆127Updated 2 weeks ago
- InferX: Inference as a Service Platform☆136Updated this week
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆279Updated last month
- VLLM Port of the Chatterbox TTS model☆306Updated last month
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆267Updated last month
- ☆165Updated last month
- ☆300Updated 2 months ago
- A persistent local memory for AI, LLMs, or Copilot in VS Code.☆150Updated last week
- ☆388Updated this week
- Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiob…☆202Updated 2 months ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,385Updated this week
- fully local, temporally aware natural language file search on your pc! even without a GPU. find relevant files using natural language i…☆120Updated last week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated 3 weeks ago
- A platform to self-host AI on easy mode☆171Updated this week
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆545Updated last month
- Benchmark and optimize LLM inference across frameworks with ease☆113Updated 3 weeks ago
- ☆178Updated last month
- BUDDIE is the first full-stack open-source AI voice interaction solution, providing a complete end-to-end system from hardware design to …☆155Updated last month
- Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.☆573Updated 2 weeks ago
- An open-source implementation of Whisper☆438Updated last week
- High-Performance Text Deduplication Toolkit☆57Updated last month
- Fully Open Language Models with Stellar Performance☆246Updated 2 months ago
- ☆33Updated 6 months ago
- GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an inter…☆97Updated this week