sanyalsunny111 / Early_Weight_AvgLinks
[COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training
☆17Updated last year
Alternatives and similar repositories for Early_Weight_Avg
Users that are interested in Early_Weight_Avg are comparing it to the libraries listed below
Sorting:
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Updated last week
- Embedding Recycling for Language models☆38Updated 2 years ago
- ☆65Updated last year
- ☆14Updated last year
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆35Updated 2 years ago
- Transformers at any scale☆41Updated last year
- MEXMA: Token-level objectives improve sentence representations☆42Updated 9 months ago
- ☆14Updated 3 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last month
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆39Updated 11 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Updated last year
- ☆16Updated last year
- ☆26Updated last year
- A package for fine tuning of pretrained NLP transformers using Semi Supervised Learning☆14Updated 4 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 3 years ago
- [ACL 2023] Few-shot Reranking for Multi-hop QA via Language Model Prompting☆27Updated last week
- Adding new tasks to T0 without catastrophic forgetting