[ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
☆26Jan 27, 2026Updated last month
Alternatives and similar repositories for KVTuner
Users that are interested in KVTuner are comparing it to the libraries listed below
Sorting:
- Multi-Agent LLM System for Digital Scam Protection☆12Dec 19, 2024Updated last year
- IBM Quantum Challenge Fall 2023☆10May 23, 2023Updated 2 years ago
- In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy cust…☆14Feb 13, 2024Updated 2 years ago
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆12Mar 27, 2024Updated last year
- Projects completed under LinuxWorld Informatics Ltd. - MLOps Training.☆12Aug 15, 2020Updated 5 years ago
- Building reliable Retrieval Augmented Generation(RAG) AI Architecture☆13Jul 30, 2024Updated last year
- ☆20Feb 18, 2025Updated last year
- Workshop for Model Context Protocol☆18Mar 27, 2025Updated 11 months ago
- ☆13Apr 22, 2024Updated last year
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated last year
- Lab files of IBM's Qiskit Global Summer School 2020.☆17Sep 3, 2020Updated 5 years ago
- Improving langchain knowledge graphs using baml☆43Aug 3, 2025Updated 6 months ago
- NLP/LLM Mlops Pipeline to dev/train/evaluation, scalable deploy and monitoring systems.☆22Mar 15, 2024Updated last year
- Reference code base for ML Engineering in Action, Manning Publications Author: Ben Wilson☆20Oct 22, 2023Updated 2 years ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆25Oct 5, 2024Updated last year
- We propose a lossless compression algorithm based on the NTK matrix for DNN. The compressed network yields asymptotically the same NTK a…☆26Nov 23, 2023Updated 2 years ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆29Jun 30, 2025Updated 8 months ago
- AI Agents with Google's Gemini Pro and Gemini Pro Vision Models☆28Jan 19, 2024Updated 2 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆358Nov 20, 2025Updated 3 months ago
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆40Apr 15, 2025Updated 10 months ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- ☆30Oct 7, 2022Updated 3 years ago
- Implementation of HEFT (Heterogeneous Earliest Finish Time) DAG Scheduling Algorithm in Python☆33Dec 17, 2022Updated 3 years ago
- Text to audio with Tik-Tok Voices☆13Apr 6, 2023Updated 2 years ago
- Intelligent Document Processing with AWS AI/ML, published by Packt☆11Feb 5, 2026Updated 3 weeks ago
- Source code for the paper "Memory-Efficient Fine-Tuning via Low-Rank Activation Compression"☆13Aug 1, 2025Updated 6 months ago
- A powerful MCP testing tool with multi-provider LLM support (Ollama, OpenAI, Claude, Gemini). Test, debug, and develop MCP servers with a…☆18Jan 7, 2026Updated last month
- This app allows users to easily query a PDF document using OpenAI's GPT-3 language model in Google Colab, utilizing Google Drive for stor…☆38May 21, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 3 months ago
- PyTorch implementation of Multi-Perspective Data Augmentation for Few-shot Object Detection☆22Apr 15, 2025Updated 10 months ago
- Keras 1D Depthwise Convolutional layer☆10May 22, 2020Updated 5 years ago
- [CVPR2023] The official repository for paper "Learning Partial Correlation based Deep Visual Representation for Image Classification" To …☆10Nov 21, 2023Updated 2 years ago
- TensorFlow materials☆13Jan 8, 2021Updated 5 years ago
- Notes and Examples to get started Parallel Computing with CUDA.☆13Nov 1, 2019Updated 6 years ago
- Home server set up☆13Oct 5, 2025Updated 4 months ago
- Summary of conference journals in computer vision☆35May 30, 2023Updated 2 years ago
- An example repository to use HuggingFace smolagents, Phidata and CrewAI frameworks with local LLMs☆39Jan 5, 2025Updated last year
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- This project focuses on developing a machine learning model to classify various electrical fault types in a transmission line. The model …☆15Apr 9, 2024Updated last year