sanyalsunny111 / Early_Weight_Avg
Pre-train LLMs faster with Early Weight Averaging.
☆14Updated 7 months ago
Related projects: ⓘ
- Code implementation of synthetic continued pretraining☆13Updated this week
- ☆13Updated last month
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆14Updated 6 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆12Updated 8 months ago
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).☆23Updated last year
- A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering☆17Updated last year
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆35Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆27Updated this week
- Embedding Recycling for Language models☆38Updated last year
- ☆22Updated 3 months ago
- ☆60Updated 5 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising☆32Updated 2 months ago
- Transformers at any scale☆39Updated 8 months ago
- Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text☆18Updated 2 years ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated last week
- ☆12Updated last week
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆29Updated last month
- mm-retrieval-evaluation☆10Updated 2 years ago
- ☆14Updated last month
- Representing Rule-based Chatbots with Transformers☆17Updated 2 months ago
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆18Updated this week
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆40Updated 8 months ago
- [ACL 24 Findings] Implementation of Resonance RoPE and the PosGen synthetic dataset.☆21Updated 6 months ago
- [ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees☆23Updated last year
- ☆14Updated 6 months ago
- ☆15Updated last month
- ☆13Updated 2 years ago
- ☆24Updated 7 months ago