UtkarshSaxena1 / EigenAttnLinks

☆16

Alternatives and similar repositories for EigenAttn

Users that are interested in EigenAttn are comparing it to the libraries listed below

Sorting:

ArminAzizi98 / LaMDA
☆15Updated 8 months ago
r-three / smear
☆30Updated last year
FeiyuZhang98 / IncreLoRA
☆33Updated last year
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆15Updated this week
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆65Updated last year
VijayLingam95 / SVFT
☆30Updated 5 months ago
ROIM1998 / APT
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
☆43Updated last year
TaiMingLu / know-dont-tell
☆15Updated 9 months ago
SempraETY / Pruning-via-Merging
☆18Updated 7 months ago
Lucky-Lance / SPP
[ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
☆21Updated last year
declare-lab / della
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
☆33Updated last year
yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆32Updated 11 months ago
zyxxmu / DSnoT
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…
☆49Updated last year
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆86Updated 3 weeks ago
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆59Updated this week
dongwonjo / FastKV
Official Implementation of FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
☆21Updated last month
Jikai0Wang / OPT-Tree
☆23Updated last month
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆31Updated last year
AkideLiu / MiniCache
☆10Updated 10 months ago
alessiodevoto / l2compress
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆14Updated 7 months ago
BaohaoLiao / ApiQ
[EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs
☆13Updated 11 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆40Updated last year
aim-uofa / LoRAPrune
☆56Updated 7 months ago
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆170Updated last year
UCSB-NLP-Chang / ThinkPrune
☆36Updated 3 months ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
hdong920 / GRIFFIN
☆38Updated 10 months ago
Infini-AI-Lab / S2FT
☆18Updated 6 months ago
harveyhuang18 / EMR_Merging
[NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging
☆59Updated 4 months ago
shoaibahmed / llm_depth_pruning
Official implementation of the paper: "A deeper look at depth pruning of LLMs"
☆15Updated 11 months ago