xiamengzhou / training_trajectory_analysis
[ACL 2023]: Training Trajectories of Language Models Across Scales https://arxiv.org/pdf/2212.09803.pdf
☆22Updated last year
Related projects ⓘ
Alternatives and complementary repositories for training_trajectory_analysis
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆62Updated 4 months ago
- Adding new tasks to T0 without catastrophic forgetting☆30Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated last year
- Few-shot Learning with Auxiliary Data☆26Updated 11 months ago
- Influence Experiments☆35Updated last year
- ☆93Updated last year
- Data Valuation on In-Context Examples (ACL23)☆23Updated last month
- ☆26Updated 8 months ago
- ☆22Updated 2 years ago
- Long Context Extension and Generalization in LLMs☆39Updated 2 months ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆26Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆69Updated last year
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆30Updated 6 months ago
- Code for the PAPA paper☆27Updated 2 years ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- ☆36Updated 3 months ago
- ☆28Updated 2 years ago
- ☆47Updated 9 months ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆15Updated last year
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆51Updated last year
- Tasks for describing differences between text distributions.☆16Updated 3 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- [ICML 2023] Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning☆39Updated last year
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆16Updated last year
- ☆40Updated 2 years ago
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated last week
- Teaching Models to Express Their Uncertainty in Words☆36Updated 2 years ago