ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆29Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for EVA
- ☆50Updated last week
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆69Updated this week
- A repository for research on medium sized language models.☆74Updated 5 months ago
- ☆61Updated 2 months ago
- The repository contains code for Adaptive Data Optimization☆18Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆76Updated 6 months ago
- ☆39Updated 9 months ago
- Collection of autoregressive model implementation☆66Updated last week
- This repo is based on https://github.com/jiaweizzhao/GaLore☆18Updated last month
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- ☆44Updated 2 months ago
- GoldFinch and other hybrid transformer components☆39Updated 3 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 6 months ago
- Utilities for Training Very Large Models☆56Updated last month
- ☆20Updated last week
- ☆30Updated last month
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆49Updated last year
- Triton Implementation of HyperAttention Algorithm☆46Updated 11 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆49Updated last month
- DPO, but faster 🚀☆20Updated 2 weeks ago
- Official implementation of "BERTs are Generative In-Context Learners"☆19Updated 4 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 9 months ago
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- ☆44Updated 2 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆74Updated 2 weeks ago