NoakLiu / Efficient-Foundation-Models-Survey
Efficient-Large-Foundation-Model-Inference: A-Perspective-From-Model-and-System-Co-Design [Efficient ML System & Model]
☆22Updated last month
Alternatives and similar repositories for Efficient-Foundation-Models-Survey:
Users that are interested in Efficient-Foundation-Models-Survey are comparing it to the libraries listed below
- Accelerating Embedding Training on Multitask Scenario [Efficient ML Model]☆11Updated 3 months ago
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆11Updated last month
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆30Updated 4 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆27Updated 4 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆35Updated 9 months ago
- ☆8Updated 7 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆34Updated 9 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆69Updated last month
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆92Updated 5 months ago
- ☆47Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆59Updated 5 months ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆45Updated last month
- ☆39Updated 3 weeks ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆28Updated 3 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆114Updated 3 weeks ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆42Updated 5 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆77Updated 10 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆134Updated last week
- ☆39Updated 4 months ago
- ☆73Updated 3 weeks ago
- Codes for Merging Large Language Models☆29Updated 8 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆44Updated 5 months ago
- Accepted LLM Papers in NeurIPS 2024☆34Updated 5 months ago
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆47Updated 2 weeks ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆69Updated 2 months ago
- ☆13Updated last week
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆19Updated 10 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆34Updated 10 months ago
- ☆50Updated last year
- ☆18Updated 3 weeks ago