akhilkedia / TranformersGetStable
[ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"
☆9Updated 9 months ago
Alternatives and similar repositories for TranformersGetStable:
Users that are interested in TranformersGetStable are comparing it to the libraries listed below
- Official code for the paper "Attention as a Hypernetwork"☆30Updated 10 months ago
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 5 months ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 2 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆11Updated last year
- ☆32Updated last year
- ☆16Updated last year
- Official implementation of ECCV24 paper: POA☆24Updated 8 months ago
- Official PyTorch implementation of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.☆13Updated 2 weeks ago
- Scaling Sparse Fine-Tuning to Large Language Models☆16Updated last year
- ☆18Updated 9 months ago
- Control LLM☆14Updated 2 weeks ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated 2 months ago
- ☆16Updated 9 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆20Updated 5 months ago
- ☆14Updated 5 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 9 months ago
- ☆9Updated last month
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆17Updated last year
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 5 months ago
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆17Updated 4 months ago
- ☆10Updated last year
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 4 months ago
- [ICLR'25] Code for KaSA, an official implementation of "KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models"☆11Updated 3 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆30Updated last month
- Official Code Repository for the paper "Key-value memory in the brain"☆24Updated 2 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆34Updated 2 months ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆12Updated last month
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago