The repository contains code for Adaptive Data Optimization
โ32Dec 9, 2024Updated last year
Alternatives and similar repositories for ado
Users that are interested in ado are comparing it to the libraries listed below
Sorting:
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.โ13Jan 9, 2024Updated 2 years ago
- Official Code Repository for [AutoScale๐: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*โฆโ13Aug 8, 2025Updated 6 months ago
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".โ24Mar 25, 2025Updated 11 months ago
- Official Repository for Dataset Inference for LLMsโ42Jul 25, 2024Updated last year
- A simple and efficient baseline for data attributionโ11Nov 10, 2023Updated 2 years ago
- ACL24โ11Jun 7, 2024Updated last year
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]โ79Nov 14, 2024Updated last year
- A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Languโฆโ86Dec 12, 2025Updated 2 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]โ32Jan 23, 2025Updated last year
- Code and Data for the ACL 2022 paper "Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling"โ11Apr 5, 2022Updated 3 years ago
- โ21Jul 21, 2025Updated 7 months ago
- โ13Dec 12, 2025Updated 2 months ago
- โ15Oct 4, 2024Updated last year
- โ37Dec 19, 2024Updated last year
- โ19Mar 25, 2025Updated 11 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our focโฆโ32Jun 13, 2024Updated last year
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-trainingโ19Oct 12, 2024Updated last year
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.โ17Sep 13, 2024Updated last year
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisitionโ19Nov 25, 2024Updated last year
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learningโ14Jun 3, 2025Updated 9 months ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]โ21May 2, 2024Updated last year
- An evaluation suite for Retrieval-Augmented Generation (RAG).โ23Apr 26, 2025Updated 10 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Modelโ45Oct 1, 2025Updated 5 months ago
- โ20Nov 4, 2025Updated 3 months ago
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"โ25Dec 12, 2023Updated 2 years ago
- โ91Aug 18, 2024Updated last year
- Source code of "What can linearized neural networks actually say about generalization?โ20Oct 21, 2021Updated 4 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.โ24Nov 23, 2022Updated 3 years ago
- [๐๐๐๐๐ ๐ ๐ข๐ง๐๐ข๐ง๐ ๐ฌ ๐๐๐๐ & ๐๐๐ ๐๐๐๐ ๐๐๐๐๐ ๐๐ซ๐๐ฅ] ๐๐ฏ๐ฉ๐ข๐ฏ๐ค๐ช๐ฏ๐จ ๐๐ข๐ต๐ฉ๐ฆ๐ฎ๐ข๐ต๐ช๐ค๐ข๐ญ ๐๐ฆ๐ข๐ด๐ฐ๐ฏ๐ช๐ฏโฆโ51May 4, 2024Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]โ149Oct 27, 2024Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.โ203Jul 17, 2024Updated last year
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teachesโ62Mar 4, 2025Updated 11 months ago
- โ30Jun 19, 2023Updated 2 years ago
- Data for "Datamodels: Predicting Predictions with Training Data"โ97May 25, 2023Updated 2 years ago
- Official Pytorch repo of CVPR'23 and NeurIPS'23 papers on understanding replication in diffusion models.โ113Nov 22, 2023Updated 2 years ago
- Simple and scalable tools for data-driven pretraining data selection.โ29Jun 9, 2025Updated 8 months ago
- โ62May 13, 2025Updated 9 months ago
- Governance of the Commons Simulation (GovSim)โ67Jan 19, 2025Updated last year
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"โ66Apr 24, 2024Updated last year