AkideLiu/MiniCache

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AkideLiu/MiniCache)

AkideLiu / MiniCache

☆14

Alternatives and similar repositories for MiniCache

Users that are interested in MiniCache are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mutonix / pyramidinfer
View on GitHub
☆47Nov 25, 2024Updated last year
RobertCsordas / switchhead
View on GitHub
☆16Jun 11, 2025Updated last year
yangyifei729 / KVSharer
View on GitHub
Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''
☆31Oct 24, 2024Updated last year
ThisisBillhe / ZipCache
View on GitHub
[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
☆33Mar 30, 2025Updated last year
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
AMD-AGI / torchtitan-amd
View on GitHub
A PyTorch native platform for training generative AI models
☆17Jun 30, 2026Updated 2 weeks ago
ml-lab-htw / llm-trees
View on GitHub
Official repo: “Oh LLM, I’m Asking Thee, Please Give Me a Decision Tree”: Zero-Shot Decision Tree Induction and Embedding with Large Lang…
☆15Updated this week
ziplab / DDP
View on GitHub
[CVPRW 2026 Oral] Less Detail, Better Answers: Degradation-Driven Prompting for VQA
☆20Apr 25, 2026Updated 2 months ago
changlin31 / AutoProg
View on GitHub
(CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers
☆25Feb 26, 2025Updated last year
gofreelee / SpaceServe
View on GitHub
☆31Jul 13, 2026Updated last week
ziplab / Pyramid-Sparse-Attention
View on GitHub
Official PyTorch implementation of [PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation](https://arxiv.org/abs…
☆25Jan 25, 2026Updated 5 months ago
NotCraft / ArxivDaily
View on GitHub
ArxivDaily
☆13Updated this week
XiaoyuanXie / xiaoyuanxie.github.io
View on GitHub
Personal Page
☆12Jul 4, 2026Updated 2 weeks ago
Aiden0526 / Aristotle
View on GitHub
Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".
☆28Oct 3, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ziplab / QLLM
View on GitHub
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆31Mar 12, 2024Updated 2 years ago
HillZhang1999 / RobustGEC
View on GitHub
Code & Data for our Paper "RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation" (EMNLP 2023)
☆17Jan 23, 2024Updated 2 years ago
C0-Design / MemoryFormer
View on GitHub
An implementation is provided here for the NeurIPS2024 paper "MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected…
☆16Mar 24, 2026Updated 3 months ago
Dragonsson / Pseudo_efficientNet
View on GitHub
Pytorch--使用伪标签训练efficientNet模型
☆11Dec 28, 2019Updated 6 years ago
JIA-Lab-research / Q-LLM
View on GitHub
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
☆54Jul 16, 2024Updated 2 years ago
xinke-wang / LVLM-Playground
View on GitHub
[ICLR2025] Are Large Vision Language Models Good Game Players?
☆13Mar 3, 2025Updated last year
GATECH-EIC / Castling-ViT
View on GitHub
[CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
☆31Mar 14, 2024Updated 2 years ago
lych1233 / GAMMA-human-ai-collaboration
View on GitHub
☆11Jan 13, 2026Updated 6 months ago
okarthikb / DPO
View on GitHub
Implementation of Direct Preference Optimization
☆17Jul 17, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OedonLestrange42 / WebNovelBench
View on GitHub
code for paper webnovelbench
☆15Jul 2, 2025Updated last year
GoJunHyeong / SpatialBias
View on GitHub
☆10Dec 13, 2022Updated 3 years ago
ziplab / efficient-stable-diffusion
View on GitHub
☆16Sep 12, 2023Updated 2 years ago
ModelTC / QLLM
View on GitHub
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆39Mar 11, 2024Updated 2 years ago
roymiles / VeLoRA
View on GitHub
[NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections
☆22Oct 15, 2024Updated last year
L1aoXingyu / llm-infer-bench
View on GitHub
☆12Sep 1, 2023Updated 2 years ago
ThisisBillhe / NAR
View on GitHub
[ICCV 2025] The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"
☆62Apr 5, 2025Updated last year
ThisisBillhe / ZipAR
View on GitHub
[ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…
☆51Mar 25, 2025Updated last year
cosdt / vllm-ascend
View on GitHub
See vLLM official support: https://github.com/vllm-project/vllm-ascend
☆11Feb 5, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
apple / ml-spin
View on GitHub
This repository contains the official implementation for the ECCV'22 paper, "SPIN: An Empirical Evaluation on Sharing Parameters of Isotr…
☆20Sep 9, 2023Updated 2 years ago
DavidFanzz / llm_decoding
View on GitHub
☆12Apr 25, 2025Updated last year
RUCAIBox / ELMER
View on GitHub
This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…
☆26Oct 27, 2022Updated 3 years ago
yikangshen / MoA
View on GitHub
Mixture of Attention Heads
☆53Oct 10, 2022Updated 3 years ago
prometheus-eval / scaling-evaluation-compute
View on GitHub
Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"
☆12Mar 25, 2025Updated last year
KuntaiDu / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Jun 10, 2026Updated last month
Corleone-Huang / RealCustomProject
View on GitHub
☆19Apr 16, 2025Updated last year