☆11Sep 7, 2024Updated last year
Alternatives and similar repositories for MiniCache
Users that are interested in MiniCache are comparing it to the libraries listed below
Sorting:
- ☆47Nov 25, 2024Updated last year
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆51Oct 18, 2024Updated last year
- [ACL2025 Oral🔥]Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling☆24Nov 11, 2025Updated 4 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆55Jul 16, 2024Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆11Jan 13, 2026Updated 2 months ago
- Implementation of Direct Preference Optimization☆17Jul 17, 2023Updated 2 years ago
- ☆53Mar 18, 2025Updated last year
- Pytorch--使用伪标签训练efficientNet模型☆11Dec 28, 2019Updated 6 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated 2 years ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Oct 15, 2024Updated last year
- ☆12Sep 1, 2023Updated 2 years ago
- ☆16Sep 12, 2023Updated 2 years ago
- ☆10Dec 13, 2022Updated 3 years ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Feb 11, 2026Updated last month
- ☆11Jan 3, 2024Updated 2 years ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated 11 months ago
- This repository contains the official implementation for the ECCV'22 paper, "SPIN: An Empirical Evaluation on Sharing Parameters of Isotr…☆20Sep 9, 2023Updated 2 years ago
- Benchmarking Attention Mechanism in Vision Transformers.☆20Oct 10, 2022Updated 3 years ago
- ☆12Apr 25, 2025Updated 10 months ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆52Jan 23, 2026Updated last month
- ☆19Apr 16, 2025Updated 11 months ago
- ☆13Mar 9, 2024Updated 2 years ago
- NeRF implementation with minimal code and maximal readability using PyTorch☆11Aug 27, 2022Updated 3 years ago
- Zoom in Lesions for Better Diagnosis: Attention Guided Deformation Network for WCE Image Classification☆13Aug 4, 2020Updated 5 years ago
- Official PyTorch implementation of CD-MOE☆12Mar 13, 2026Updated last week
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Feb 29, 2024Updated 2 years ago
- ☆40Feb 20, 2026Updated last month
- A simple LaTeX template for CUHK thesis.☆14Apr 24, 2023Updated 2 years ago
- Structured Binary Neural Networks for Image Recognition☆18Nov 18, 2021Updated 4 years ago
- Agent Memory Playground: AI Agent Memory Design & Optimization Techniques☆32Aug 7, 2025Updated 7 months ago
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆23Apr 26, 2025Updated 10 months ago
- Code and Data for ACL 2025 Paper "Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework".☆24Oct 3, 2025Updated 5 months ago
- Codes for the paper "Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex"☆21Jul 25, 2020Updated 5 years ago
- Reference implementation of models from Nyonic Model Factory☆12May 13, 2024Updated last year
- The official implementation of BiViT: Extremely Compressed Binary Vision Transformers☆16Jun 18, 2023Updated 2 years ago