gszfwsb/Awesome-Dataset-Reduction

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gszfwsb/Awesome-Dataset-Reduction)

gszfwsb / Awesome-Dataset-Reduction

A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset selection).

☆61

Alternatives and similar repositories for Awesome-Dataset-Reduction

Users that are interested in Awesome-Dataset-Reduction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZichenWen1 / EPIC
View on GitHub
(NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"
☆50Feb 11, 2026Updated 4 months ago
NUS-HPC-AI-Lab / DD-Ranking
View on GitHub
Data distillation benchmark
☆73Jun 13, 2025Updated last year
gszfwsb / AutoGnothi
View on GitHub
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"
☆23Mar 4, 2025Updated last year
gszfwsb / NCFM
View on GitHub
Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in C…
☆414Jun 3, 2026Updated last month
zzp1012 / SAM-in-Late-Phase
View on GitHub
[ICLR 2025 Spotlight] Code release for "Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training"
☆19Feb 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
he-y / you-only-condense-once
View on GitHub
You Only Condense Once: Two Rules for Pruning Condensed Datasets (NeurIPS 2023)
☆16Nov 18, 2023Updated 2 years ago
bardisafa / PreSel
View on GitHub
[CVPR 2025] An Implementation of the paper "Pre-Instruction Data Selection for Visual Instruction Tuning"
☆17Jun 9, 2025Updated last year
wintertee / DiPE-Linear
View on GitHub
The official implementation of paper "Disentangled Parameter-Efficient Linear Model for Long-Term Time Series Forecasting" (DASFAA 2026)
☆17Apr 21, 2026Updated 2 months ago
Selen-Suyue / MBA
View on GitHub
[RA-L 2025 & ICRA 2026] Motion Before Action: Diffusing Object Motion as Manipulation Condition
☆73Nov 4, 2025Updated 8 months ago
Frostlinx / Socratic-Zero
View on GitHub
Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning
☆38Oct 26, 2025Updated 8 months ago
princetonvisualai / What-is-Dataset-Distillation-Learning
View on GitHub
☆17Jun 14, 2024Updated 2 years ago
zzp1012 / LLFC
View on GitHub
[NeurIPS 2023] Code release for "Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity"
☆19Oct 19, 2023Updated 2 years ago
adymaharana / d2pruning
View on GitHub
☆44Oct 13, 2023Updated 2 years ago
NUS-HPC-AI-Lab / PAD
View on GitHub
Prioritize Alignment in Dataset Distillation
☆21Dec 3, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ZichenWen1 / DART
View on GitHub
[EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆121Oct 12, 2025Updated 8 months ago
VILA-Lab / DELT
View on GitHub
(CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA…
☆28Aug 23, 2025Updated 10 months ago
EricJin2002 / SIME
View on GitHub
[IROS 2025] SIME: Enhancing Policy Self-Improvement with Modal-level Exploration
☆17Mar 2, 2026Updated 4 months ago
Selen-Suyue / MetaPalace
View on GitHub
Let you in a meta world of The Palace Museum
☆23Aug 30, 2025Updated 10 months ago
cage-policy / CAGE
View on GitHub
[ICRA 2025] CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation
☆36Jan 14, 2025Updated last year
maomaocun / dLLM-cache
View on GitHub
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆209May 1, 2026Updated 2 months ago
boone891214 / MEST
View on GitHub
[NeurIPS‘2021] "MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge", Geng Yuan, Xiaolong Ma, Yanzhi Wang et al…
☆17Mar 16, 2022Updated 4 years ago
skypea / DAG_No_Fear
View on GitHub
NeurIPS 2020 Spotlight Paper
☆13Dec 20, 2021Updated 4 years ago
Saehyung-Lee / DCC
View on GitHub
This repository is the official implementation of Dataset Condensation with Contrastive Signals (DCC), accepted at ICML 2022.
☆22Jun 8, 2022Updated 4 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
silicx / GoldFromOres-BiLP
View on GitHub
Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)
☆25Jul 6, 2024Updated 2 years ago
ZichenWen1 / DIJA
View on GitHub
(ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"
☆79Feb 9, 2026Updated 5 months ago
uitrbn / IDM
View on GitHub
☆28Jun 12, 2023Updated 3 years ago
mengcius / PyTorch-Learning-Rate-Scheduler
View on GitHub
torch.optim.lr_scheduler
☆10Mar 17, 2020Updated 6 years ago
ada-shen / icCNN
View on GitHub
This repository is a pytorch implementation of interpretable compositional convolutional neural networks.
☆22May 24, 2023Updated 3 years ago
Chiaraplizz / ARGO1M-What-can-a-cook
View on GitHub
☆11Jul 14, 2023Updated 2 years ago
langtech-bsc / Wikiextractor-V2
View on GitHub
Enhaced version of Wikiextrator: A wikipedia dumps extractor
☆29Sep 17, 2025Updated 9 months ago
Adam1679 / mutan-article-net
View on GitHub
Implementation of Mutan+ArticleNet on OKVQA
☆10Jan 11, 2021Updated 5 years ago
rise-policy / RISE
View on GitHub
[IROS 2024] 📈 RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective
☆156Nov 29, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HotanLee / DeFT
View on GitHub
The official implementation for paper: Vision-Language Models are Strong Noisy Label Detectors
☆18Mar 31, 2025Updated last year
xuyang-liu16 / V2Drop
View on GitHub
[CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models
☆32May 27, 2026Updated last month
zeroQiaoba / ALIM
View on GitHub
Official code of "ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning"
☆23Sep 25, 2023Updated 2 years ago
MCG-NJU / Video-DC
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
yueyu1030 / DABNet
View on GitHub
This is the repository for paper `Learning Task-Aware Effective Brain Connectivity for fMRI Analysis with Graph Neural Networks'.
☆14Nov 22, 2023Updated 2 years ago
Thinklab-SJTU / UP2ME
View on GitHub
Official implementation of our ICML 2024 paper "UP2ME: Univariate Pre-training to Multivariate Fine-tuning as a General-purpose Framework…
☆35May 12, 2025Updated last year
SecureAIAutonomyLab / MA-ToT
View on GitHub
☆13Oct 31, 2024Updated last year