TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
☆1,622Mar 27, 2026Updated 3 months ago
Alternatives and similar repositories for turboquant
Users that are interested in turboquant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A collection of outcomes and discoveries from our legal AI research projects☆31Jun 23, 2026Updated last week
- Modular task agnostic training pipeline using LFM2 from Liquid AI with unsloth.☆16Sep 13, 2025Updated 9 months ago
- 🔬Experimental Minio (S3) Gateway for iRODS 💾☆12Aug 13, 2019Updated 6 years ago
- The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"☆167May 15, 2026Updated last month
- ☆49Jan 19, 2026Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 5 months ago
- Official repository Flash Local Linear Attention☆37May 28, 2026Updated last month
- Simple example for learning and serving 'MNIST' in kubernetes cluster☆10Mar 27, 2019Updated 7 years ago
- ☆16May 26, 2016Updated 10 years ago
- Code for "ATTA: Anomaly-aware Test-Time Adaptation for Out-of-Distribution Detection in Segmentation" (NeurIPS 23)☆16Apr 12, 2024Updated 2 years ago
- Content scraper/bulk downloader☆18May 10, 2023Updated 3 years ago
- ToRoLaMa: The Vietnamese Instruction-Following and Chat Model☆24Jan 4, 2024Updated 2 years ago
- ☆10Aug 9, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- Uses AWS Chalice, Python, Pine Script, TradingView Charts, and the Alpaca API in conjunction to make a stock trading bot.☆12Dec 21, 2020Updated 5 years ago
- ☆44Feb 4, 2026Updated 4 months ago
- Source Code for Partial Interference☆10Dec 17, 2022Updated 3 years ago
- modified cutlass☆16Oct 26, 2020Updated 5 years ago
- Talk to your shell in natural language. Locally.☆54Feb 15, 2026Updated 4 months ago
- 基于langchain和chatglm6b构建的智能问答系统,支持自定义语料☆10Jun 25, 2023Updated 3 years ago
- Developing a legal research tool leveraging ChatGPT / GPT-4☆14Mar 10, 2024Updated 2 years ago
- OHZI Core Library — a collection of reusable classes to build high-quality WebGL & WebGPU experiences faster. A foundational utility libr…☆13Jun 20, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- a QEMU + gem5 co-simulation framework for AMD MI300X GPU research.☆52Updated this week
- Universal MCP server installer - install any MCP server to any AI agent with one command☆20Feb 14, 2026Updated 4 months ago
- (CVPR 2026) PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction☆60May 31, 2026Updated last month
- ☆13Sep 25, 2023Updated 2 years ago
- Unreal Engine to Ethereal Engine Backend Bridge☆12Aug 4, 2023Updated 2 years ago
- Official implementation for TAO (CVPR 2025)☆20Jan 1, 2026Updated 5 months ago
- A comprehensive repository for Compute Express Link (CXL) resources: covering research papers, specifications, simulation/emulation tools…☆26Feb 24, 2026Updated 4 months ago
- A WebAssembly eBPF runtime based on wasmtime in rust☆11Feb 20, 2023Updated 3 years ago
- Compression schema for gradients of activations in backward pass☆45Jul 26, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Simple Automatic Number Plate Recognition using Yolov7 engine & EasyOCR library☆12Nov 4, 2022Updated 3 years ago
- A new memory mapping interface for efficient direct user-space access to byte-addressable storage, published in MICRO2022.☆16Sep 29, 2022Updated 3 years ago
- Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚☆22Jul 14, 2025Updated 11 months ago
- LEMMA: Logical Engine for Multi-domain Mathematical Analysis☆28Feb 14, 2026Updated 4 months ago
- PyTorch Implementation of Image Generation with a Sphere Encoder☆45May 20, 2026Updated last month
- Repository for "Echoes of the Coliseum: Towards 3D Live streaming of Sports Events"☆50Jun 17, 2026Updated last week
- ☆114Mar 27, 2026Updated 3 months ago