TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
☆1,393Mar 27, 2026Updated last month
Alternatives and similar repositories for turboquant
Users that are interested in turboquant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Jan 10, 2019Updated 7 years ago
- The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"☆108Updated this week
- Source code for the article on generating thumbnails in AWS Lambda.☆15May 2, 2023Updated 3 years ago
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 4 months ago
- A simple GPT-3 interface to automate core legal writing tasks☆13Mar 8, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated 3 weeks ago
- Datasette plugin for searching all searchable tables at once☆29Nov 3, 2025Updated 6 months ago
- Vue.js and AWS Amplify SaaS template site with pre-built login flow using AWS Amplify☆20Jul 14, 2023Updated 2 years ago
- Simple example for learning and serving 'MNIST' in kubernetes cluster☆10Mar 27, 2019Updated 7 years ago
- A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.☆15Feb 24, 2022Updated 4 years ago
- ☆11May 4, 2023Updated 3 years ago
- Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…☆90Mar 6, 2026Updated 2 months ago
- ToRoLaMa: The Vietnamese Instruction-Following and Chat Model☆24Jan 4, 2024Updated 2 years ago
- ☆44Feb 4, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Talk to your shell in natural language. Locally.☆54Feb 15, 2026Updated 3 months ago
- modified cutlass☆16Oct 26, 2020Updated 5 years ago
- 基于langchain和chatglm6b构建的智能问答系统,支持自定义语料☆10Jun 25, 2023Updated 2 years ago
- [ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#☆73Feb 12, 2025Updated last year
- OHZI Core Library — a collection of reusable classes to build high-quality WebGL & WebGPU experiences faster. A foundational utility libr…☆13Mar 31, 2026Updated last month
- a QA bot on contents of given docs 用所给文档进行问答的聊天机器人☆12Apr 20, 2023Updated 3 years ago
- a QEMU + gem5 co-simulation framework for AMD MI300X GPU research.☆48Apr 29, 2026Updated 2 weeks ago
- Universal MCP server installer - install any MCP server to any AI agent with one command☆19Feb 14, 2026Updated 3 months ago
- Supplementary code for the paper "UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Lear…☆15Nov 10, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆13Sep 25, 2023Updated 2 years ago
- Unreal Engine to Ethereal Engine Backend Bridge☆12Aug 4, 2023Updated 2 years ago
- A comprehensive repository for Compute Express Link (CXL) resources: covering research papers, specifications, simulation/emulation tools…☆25Feb 24, 2026Updated 2 months ago
- Comparing Deep Learning Inference of Pytorch models running on CPU, CUDA and TensorRT☆17Feb 20, 2022Updated 4 years ago
- Fusing 2D Material World Knowledge on 3D Geometry☆55Mar 23, 2026Updated last month
- [SIGGRAPH-ASIA 2025] Official implementation of "VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Model…☆134Mar 19, 2026Updated 2 months ago
- reddit account/upvote script☆25Nov 3, 2022Updated 3 years ago
- Invariant Feature Regularization for Fair Face Recognition (ICCV'23)☆15Oct 23, 2023Updated 2 years ago
- LEMMA: Logical Engine for Multi-domain Mathematical Analysis☆28Feb 14, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Deploy the SC2 system on Kubernetes.☆11May 7, 2025Updated last year
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Dec 18, 2019Updated 6 years ago
- pytorch+bert实现的意图识别与槽位填充☆11May 30, 2023Updated 2 years ago
- 基于pytorch_rnn的古诗词生成☆11Oct 24, 2021Updated 4 years ago
- ☆6,793May 9, 2026Updated last week
- IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse☆101Mar 14, 2026Updated 2 months ago
- ComfyUI Viewer extension to provide OpenReel for video editing☆74Feb 21, 2026Updated 2 months ago