yidingjiang/ado

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yidingjiang/ado)

yidingjiang / ado

The repository contains code for Adaptive Data Optimization

☆37

Alternatives and similar repositories for ado

Users that are interested in ado are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
vasusingla / simple-data-attribution
View on GitHub
A simple and efficient baseline for data attribution
☆11Nov 10, 2023Updated 2 years ago
cxcscmu / MATES
View on GitHub
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆80Nov 14, 2024Updated last year
daeungo1 / azure-openai-prompthon
View on GitHub
Azure OpenAI 프롬프톤
☆11Sep 23, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
locuslab / acr-memorization
View on GitHub
☆41Dec 19, 2024Updated last year
TristanThrush / perplexity-correlations
View on GitHub
Simple and scalable tools for data-driven pretraining data selection.
☆30Jun 9, 2025Updated last year
eth-lre / LLM_ICL
View on GitHub
ACL24
☆11Jun 7, 2024Updated 2 years ago
pratyushmaini / llm_dataset_inference
View on GitHub
Official Repository for Dataset Inference for LLMs
☆41Jul 25, 2024Updated last year
tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
hsouri / bob-detection
View on GitHub
☆12Oct 20, 2023Updated 2 years ago
y0mingzhang / diffuse-distributions
View on GitHub
Forcing Diffuse Distributions out of Language Models
☆18Sep 10, 2024Updated last year
hsouri / GDP
View on GitHub
Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
☆11Apr 1, 2024Updated 2 years ago
JonasGeiping / dataaugs
View on GitHub
☆18Oct 12, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hsouri / bob-classification
View on GitHub
☆11Oct 20, 2023Updated 2 years ago
pietrolesci / memorisation-profiles
View on GitHub
This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".
☆25Mar 25, 2025Updated last year
tml-epfl / long-is-more-for-alignment
View on GitHub
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆21May 2, 2024Updated 2 years ago
JonasGeiping / fullbatchtraining
View on GitHub
Training vision models with full-batch gradient descent and regularization
☆40Feb 14, 2023Updated 3 years ago
goldblum / TruthOrBackpropaganda
View on GitHub
An empirical investigation of deep learning theory
☆16Oct 3, 2019Updated 6 years ago
MadryLab / journey-TRAK
View on GitHub
Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
☆25Dec 12, 2023Updated 2 years ago
davidbrandfonbrener / color-filter-olmo
View on GitHub
☆13Dec 12, 2025Updated 7 months ago
ethz-spylab / superhuman-ai-consistency
View on GitHub
☆30Jun 19, 2023Updated 3 years ago
chawins / pal
View on GitHub
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆57Aug 17, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
AminJun / ImageNet1KBoundingBoxes
View on GitHub
Pytorch ImageNet1k Loader with Bounding Boxes.
☆13Jan 23, 2022Updated 4 years ago
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
MadryLab / datamodels-data
View on GitHub
Data for "Datamodels: Predicting Predictions with Training Data"
☆97May 25, 2023Updated 3 years ago
RUCBM / ICLEval
View on GitHub
☆14Jun 24, 2024Updated 2 years ago
smitkiri / news-qa
View on GitHub
Reading comprehension based question-answering model for news articles.
☆11Jun 22, 2022Updated 4 years ago
levilelis / h-levin
View on GitHub
Levin tree search guided by both a policy and a heuristic function
☆19Jul 13, 2023Updated 3 years ago
somepago / DCR
View on GitHub
Official Pytorch repo of CVPR'23 and NeurIPS'23 papers on understanding replication in diffusion models.
☆113Nov 22, 2023Updated 2 years ago
yuzhaouoe / pretraining-data-packing
View on GitHub
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆24Aug 18, 2024Updated last year
goldblum / free-lunch
View on GitHub
Implementation of experiments from The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
☆17May 14, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
VITA-Group / Q-GaLore
View on GitHub
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆206Jul 17, 2024Updated 2 years ago
haolunc / iGSM-Replication-physics-LLM
View on GitHub
This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.
☆17Sep 13, 2024Updated last year
Victorwz / LaViA
View on GitHub
☆10Jul 13, 2024Updated 2 years ago
ahans30 / goldfish-loss
View on GitHub
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆98Nov 17, 2024Updated last year
allenai / hybrid-preferences
View on GitHub
Learning to route instances for Human vs AI Feedback (ACL Main '25)
☆29Jul 23, 2025Updated last year
ablghtianyi / ICL_Modular_Arithmetic
View on GitHub
☆19Mar 25, 2025Updated last year
tml-epfl / llm-past-tense
View on GitHub
Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]
☆79Jan 23, 2025Updated last year