NoakLiu / Efficient-Foundation-Models-Survey

Efficient-Large-Foundation-Model-Inference: A-Perspective-From-Model-and-System-Co-Design [Efficient ML System & Model]

☆22

Alternatives and similar repositories for Efficient-Foundation-Models-Survey:

Users that are interested in Efficient-Foundation-Models-Survey are comparing it to the libraries listed below

NoakLiu / MT2ST
Accelerating Embedding Training on Multitask Scenario [Efficient ML Model]
☆11Updated 3 months ago
NoakLiu / DRTR
Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]
☆11Updated last month
NoakLiu / GraphSnapShot
GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]
☆30Updated 4 months ago
Osilly / dynamic_llava
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆27Updated 4 months ago
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆35Updated 9 months ago
AkideLiu / MiniCache
☆8Updated 7 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆34Updated 9 months ago
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆69Updated last month
SUSTechBruce / LOOK-M
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆92Updated 5 months ago
LiuXiaoxuanPKU / OSD
☆47Updated 4 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆59Updated 5 months ago
hemingkx / SWIFT
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆45Updated last month
OpenSparseLLMs / Linearization
☆39Updated 3 weeks ago
haonan3 / V1
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆28Updated 3 weeks ago
hemingkx / TokenSkip
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
☆114Updated 3 weeks ago
YouAreSpecialToMe / QST
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
☆42Updated 5 months ago
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆77Updated 10 months ago
mit-han-lab / x-attention
XAttention: Block Sparse Attention with Antidiagonal Scoring
☆134Updated last week
mutonix / pyramidinfer
☆39Updated 4 months ago
OpenSparseLLMs / MoM
☆73Updated 3 weeks ago
yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆29Updated 8 months ago
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆44Updated 5 months ago
Persdre / NeurIPS-2024-LLM-Papers
Accepted LLM Papers in NeurIPS 2024
☆34Updated 5 months ago
hychaochao / EMMA
The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"
☆47Updated 2 weeks ago
FFY0 / AdaKV
The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
☆69Updated 2 months ago
kokolerk / R1-V-GUI-agent
☆13Updated last week
Lucky-Lance / SPP
[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
☆19Updated 10 months ago
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆34Updated 10 months ago
yxli2123 / LoSparse
☆50Updated last year
TemporaryLoRA / Block-Attention
☆18Updated 3 weeks ago