hwang595 / CuttlefishLinks

The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"

☆43

Alternatives and similar repositories for Cuttlefish

Users that are interested in Cuttlefish are comparing it to the libraries listed below

Sorting:

JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated 2 years ago
yxli2123 / LoSparse
☆61Updated 2 years ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆120Updated last year
berlino / gated_linear_attention
☆105Updated last year
insuhan / hyper-attn
☆83Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆131Updated 2 years ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆101Updated 4 months ago
metacarbon / shareAtt
Beyond KV Caching: Shared Attention for Efficient LLMs
☆19Updated last year
hdong920 / GRIFFIN
☆38Updated last year
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated 11 months ago
epfml / dynamic-sparse-flash-attention
☆149Updated 2 years ago
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆80Updated 3 months ago
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆103Updated last year
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆26Updated 2 weeks ago
andyjm3 / SLTrain
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆34Updated 11 months ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆19Updated 10 months ago
HazyResearch / fly
☆218Updated 2 years ago
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
epfml / pam
☆16Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
microsoft / SparseMixer
Sparse Backpropagation for Mixture-of-Expert Training
☆29Updated last year
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆43Updated last year
sustcsonglin / linear-attention-and-beyond-slides
☆91Updated 7 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Updated last year
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆70Updated 7 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year