hilllief/polarquant-kv

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hilllief/polarquant-kv)

hilllief / polarquant-kv

LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss

☆57

Alternatives and similar repositories for polarquant-kv

Users that are interested in polarquant-kv are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gunpowder78 / RobustVideoMatting
View on GitHub
☆13Aug 30, 2021Updated 4 years ago
charlescao92 / xrtc-webrtc-m96
View on GitHub
基于Go实现信令服务器，基于webrtc-m96实现webrtc推拉流服务器和PC端推拉流SDK
☆14May 13, 2023Updated 3 years ago
KuntaiDu / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Jun 10, 2026Updated 3 weeks ago
yizhiru / scala-AC
View on GitHub
Scala implementation of Aho-Corasick algorithm
☆15May 21, 2026Updated last month
tpoisonooo / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆11Mar 24, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
shijunbao / prompt-manager
View on GitHub
集中管理所有的prompt。
☆14Nov 27, 2024Updated last year
thu-ml / Efficient-Diffusion-Alignment
View on GitHub
Official Codebase for "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control" (NeurIPS 2024)
☆15Oct 29, 2024Updated last year
JJXiangJiaoJun / cutlass_gemv
View on GitHub
GEMV implementation with CUTLASS
☆21Aug 21, 2025Updated 10 months ago
kyegomez / FastFF
View on GitHub
Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"
☆16Nov 11, 2024Updated last year
lyapple2008 / FaceBeauty
View on GitHub
☆25Jul 3, 2018Updated 8 years ago
sands321 / znote
View on GitHub
🖖 图谱式笔记系统，旨在提高个人笔记的使用率！
☆11Jan 17, 2021Updated 5 years ago
Bastian / Abstractive-Summarization-of-Meetings
View on GitHub
The source code for my bachelor's thesis "Abstractive Summarization of Meetings"
☆21May 21, 2021Updated 5 years ago
navindevan / MeetSummAIzer-AzureOpenAI
View on GitHub
A Python-based tool that uses Azure OpenAI to process and summarize meeting transcripts from platforms like Microsoft Teams and Skype. Si…
☆16Jan 29, 2025Updated last year
ensariskin / 5-Stage-Pipeline-RV32I
View on GitHub
5 Stage Pipelined RISC V Processor Design for RV32I Instruction Set
☆10Sep 15, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
dougallj / asil
View on GitHub
☆35Jun 15, 2026Updated 3 weeks ago
Freder-chen / ReasonGenRM
View on GitHub
A simple implementation of ReasonGenRM.
☆19Apr 21, 2025Updated last year
SparkJiao / llama-pipeline-parallel
View on GitHub
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…
☆58Jul 4, 2023Updated 3 years ago
ml-inory / melotts.axera
View on GitHub
MeloTTS demo on Axera
☆14Jul 1, 2026Updated last week
Lauorie / DFT
View on GitHub
Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629
☆23Oct 14, 2025Updated 8 months ago
Mythos-Rudy / mnbvc-fasttext-classification
View on GitHub
this repo is mnbvc text quality classification using fastText
☆16Oct 2, 2023Updated 2 years ago
timmy1688 / wtian
View on GitHub
☆20Jan 25, 2026Updated 5 months ago
jadepeng / bertTokenizer
View on GitHub
java implementation of Bert Tokenizer, support output onnx tensor for onnx model inference
☆13Sep 4, 2023Updated 2 years ago
Paul33333 / SFT-and-DPO
View on GitHub
This is a detailed code demo on how to conduct Full-Param Supervised Fine-tuning (SFT) and DPO (Direct Preference Optimization)
☆19Jan 9, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
muggle-stack / sensevoice_cpp
View on GitHub
☆25Mar 8, 2026Updated 4 months ago
openjdk / jdk21u-dev
View on GitHub
https://openjdk.org/projects/jdk-updates
☆40Updated this week
junkangwu / alpha-DPO
View on GitHub
[ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"
☆31Jan 10, 2026Updated 5 months ago
21335732529sky / negative_supervision
View on GitHub
The implementation of Text Classification with Negative Supervision (ACL, 2020)
☆10Oct 8, 2020Updated 5 years ago
InfiniTensor / InfiniLM
View on GitHub
☆151Updated this week
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 8 months ago
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
DezhiKong00 / Sentencepiece-chinese-bbpe
View on GitHub
使用Sentencepiece对中文语料进行分词
☆13Nov 30, 2023Updated 2 years ago
ckvv / lumos
View on GitHub
在您的机器上本地离线运行 AI 模型
☆11May 8, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
metaimagine / ImLeile
View on GitHub
👂 Typing is slow, talk to me. The project name means ' i am tired ' in Chinese (我累了). This is a AI efficiency assistant, complete your d…
☆16Jun 8, 2024Updated 2 years ago
seanzhang-zhichen / Qwen-WisdomVast
View on GitHub
Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …
☆17Apr 12, 2024Updated 2 years ago
neavo / KeywordGachaModel
View on GitHub
☆17Jan 31, 2025Updated last year
aresbit / fetch-skill
View on GitHub
fetch-skill
☆163Mar 22, 2026Updated 3 months ago
XMUDeepLIT / Translatotron-V
View on GitHub
Code for "Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation" (Findings of ACL 2024)
☆16Jul 4, 2024Updated 2 years ago
CelVoxes / thinkR
View on GitHub
gpt-o1 like chain of thoughts with local LLMs in R
☆31Oct 15, 2024Updated last year
unia-sik / riscVivid
View on GitHub
A RISC-V processor simulator
☆29Apr 23, 2026Updated 2 months ago