ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆30Updated last month
Related projects ⓘ
Alternatives and complementary repositories for EVA
- ☆53Updated 3 weeks ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆62Updated 3 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆19Updated 2 months ago
- ☆77Updated 7 months ago
- ☆22Updated 2 weeks ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆49Updated last year
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆19Updated 5 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 11 months ago
- ☆43Updated 2 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 10 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆53Updated this week
- A repository for research on medium sized language models.☆74Updated 6 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- GoldFinch and other hybrid transformer components☆40Updated 4 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 4 months ago
- ☆55Updated last month
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆71Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆112Updated 3 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 7 months ago
- ☆41Updated 2 weeks ago
- Collection of autoregressive model implementation☆67Updated this week
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- ☆35Updated 7 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated 10 months ago
- ☆27Updated 5 months ago