TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
☆1,205Mar 27, 2026Updated last month
Alternatives and similar repositories for turboquant
Users that are interested in turboquant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆162Mar 30, 2026Updated 3 weeks ago
- Rust library implementing the Toorani-Beheshti signcryption scheme☆13Aug 15, 2023Updated 2 years ago
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 3 months ago
- Simple example for learning and serving 'MNIST' in kubernetes cluster☆10Mar 27, 2019Updated 7 years ago
- SVG Analysis and generation tools for commonly seen SVG attachment phishing☆56Sep 24, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- Talk to your shell in natural language. Locally.☆54Feb 15, 2026Updated 2 months ago
- Vector search with Pinecone and Openai to search through contract law textbook. If downloaded, remeber to install all dependencies. Refer…☆11Mar 30, 2023Updated 3 years ago
- a QEMU + gem5 co-simulation framework for AMD MI300X GPU research.☆44Apr 19, 2026Updated last week
- Circuit-level PDP-11/34 emulator☆67Apr 8, 2026Updated 3 weeks ago
- Evaluation Suite for NVMe devices☆14Nov 14, 2024Updated last year
- Universal MCP server installer - install any MCP server to any AI agent with one command☆18Feb 14, 2026Updated 2 months ago
- Extended GitHub MCP Server with additional tools for pull request review comment functionality☆22Apr 1, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A comprehensive repository for Compute Express Link (CXL) resources: covering research papers, specifications, simulation/emulation tools…☆25Feb 24, 2026Updated 2 months ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- Compression schema for gradients of activations in backward pass☆45Jul 26, 2023Updated 2 years ago
- A new memory mapping interface for efficient direct user-space access to byte-addressable storage, published in MICRO2022.☆15Sep 29, 2022Updated 3 years ago
- ☆6,508Updated this week
- On demand communication☆32Apr 16, 2026Updated last week
- Production-grade template for Pi-powered Chrome extensions☆47Updated this week
- Deploy the SC2 system on Kubernetes.☆10May 7, 2025Updated 11 months ago
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Dec 18, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆13Feb 19, 2025Updated last year
- 基于pytorch_rnn的古诗词生成☆11Oct 24, 2021Updated 4 years ago
- tinypy is a minimalist implementation of python☆16Dec 30, 2010Updated 15 years ago
- A portable, embeddable implementation of the BASIC programming language.☆16Feb 21, 2013Updated 13 years ago
- ☆12May 30, 2023Updated 2 years ago
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆16Sep 8, 2022Updated 3 years ago
- ☆10Aug 28, 2018Updated 7 years ago
- ☆13Jan 14, 2026Updated 3 months ago
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆27Aug 27, 2025Updated 8 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code needed to reproduce results from my ICLR 2019 paper on fixed-point quantization of the backprop algorithm.☆10Jan 24, 2019Updated 7 years ago
- ☆18May 28, 2024Updated last year
- ☆18Jul 2, 2024Updated last year
- Traction adaptive motion planning using sampling augmented adaptive RTI☆11Jun 6, 2021Updated 4 years ago
- Declarative audio synthesis for the web☆155Updated this week
- Implementation of Hippoformer, Integrating Hippocampus-inspired Spatial Memory with Transformers☆50Feb 5, 2026Updated 2 months ago
- A complete end-to-end system that takes mathematical problems and automatically generates polished educational videos☆32Jan 3, 2026Updated 3 months ago