Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
☆612Feb 23, 2026Updated last month
Alternatives and similar repositories for SINQ
Users that are interested in SINQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Welcome to the official repository of AC-LORA: (Almost) Training-Free Access Control-Aware Multi-Modal LLMs, a mechanism that provides tr…☆21Nov 14, 2025Updated 5 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆51Oct 29, 2025Updated 5 months ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- Two-Step Quantization on AlexNet☆13Jun 29, 2018Updated 7 years ago
- Home of ALP/GraphBLAS and ALP/Pregel, featuring shared- and distributed-memory auto-parallelisation of linear algebraic and vertex-centri…☆33Apr 2, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆11Feb 20, 2025Updated last year
- ☆19Feb 23, 2026Updated last month
- Yet Another (LLM) Web UI, made with Gemini☆12Dec 25, 2024Updated last year
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input a target size and the toolchain w…☆108Updated this week
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆91Oct 22, 2024Updated last year
- ☆45Oct 28, 2025Updated 5 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆123Oct 15, 2025Updated 6 months ago
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆25Sep 1, 2025Updated 7 months ago
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆32Jan 23, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆21Mar 25, 2026Updated 3 weeks ago
- Schema-aware JSON compression with millisecond lookups — cut transfer/storage while enabling exists /pos queries. (Demo + wheels; core is…☆24Feb 21, 2026Updated last month
- ☆22Sep 20, 2025Updated 7 months ago
- An fully autonomous agent that accesses the browser and performs tasks.☆18Apr 25, 2025Updated 11 months ago
- Watch for file changes and auto restart an application using fork checkpoints to continue the process (for quick live development)☆13Dec 30, 2021Updated 4 years ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Sep 24, 2023Updated 2 years ago
- Makes llama.cpp easy to use.☆12May 14, 2025Updated 11 months ago
- A fast DNN library in C++, and a version in Python for prototyping☆20Feb 14, 2026Updated 2 months ago
- With dri3 we can configure in ~/.drirc which GPU a program with a given name should be rendered on. This is a small utlity to make this p…☆10Oct 21, 2016Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆11Sep 18, 2023Updated 2 years ago
- A chat UI for Llama.cpp☆16Apr 10, 2026Updated last week
- Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.☆23Nov 26, 2025Updated 4 months ago
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆73Jul 8, 2025Updated 9 months ago
- ☆168Jun 22, 2025Updated 9 months ago
- [ICLR2026] codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆795Feb 4, 2026Updated 2 months ago
- Generate high resolution videos with a custom voice and appearance, based on LTX-2/LTX-2.3 + Identity In-Context LoRA☆261Mar 24, 2026Updated 3 weeks ago
- Transformer experiments☆16May 8, 2023Updated 2 years ago
- Ubiquité : Open-source Perplexity clone with multi-LLM support and KaTeX math rendering.☆47Nov 14, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆12Jan 4, 2024Updated 2 years ago
- An extension for oobabooga/text-generation-webui that automatically unloads and reloads your model.☆17Apr 22, 2024Updated last year
- Super-simple, fully Rust powered "memory" (doc store + semantic search) for LLM projects, semantic search, etc.☆66Oct 9, 2023Updated 2 years ago
- 🤖 AI-powered CLI for file reorganization. Runs fully locally — no data leaves your machine.☆20Jul 2, 2025Updated 9 months ago
- Fast low-bit matmul kernels in Triton☆443Apr 4, 2026Updated 2 weeks ago
- LLM KV cache compression made easy☆1,042Updated this week
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆29May 6, 2025Updated 11 months ago