BAAI-DCAI/Dataset-Pruning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BAAI-DCAI/Dataset-Pruning)

BAAI-DCAI / Dataset-Pruning

Dataset pruning for ImageNet and LAION-2B.

☆80

Alternatives and similar repositories for Dataset-Pruning

Users that are interested in Dataset-Pruning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
BAAI-DCAI / Training-Data-Synthesis
View on GitHub
[ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching
☆80Dec 9, 2023Updated 2 years ago
rgeirhos / dataset-pruning-metrics
View on GitHub
Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)
☆58Apr 24, 2023Updated 3 years ago
hrtan / MoSo
View on GitHub
[NeurIPS-2023] The PyTorch Implementation of MoSo. The algorithms are based on our paper: "Data Pruning via Moving-one-Sample-out". MoSo …
☆10May 21, 2026Updated 2 months ago
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
VainF / Reasoning-SFT
View on GitHub
SFT of Reasoning LLMs with Megatron-LM
☆23Jun 19, 2025Updated last year
Carol-lyh / GateControl
View on GitHub
☆22Apr 3, 2026Updated 3 months ago
BBBiiinnn / SynArtifact
View on GitHub
☆18Apr 28, 2024Updated 2 years ago
yu-rp / NeuralLineage
View on GitHub
Code for CVPR 2024 Oral "Neural Lineage"
☆17Jun 18, 2024Updated 2 years ago
OPTML-Group / DP4TL
View on GitHub
[NeurIPS2023] "Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning" by Yihua Zhang*, Yimeng Zhang*,…
☆14Oct 12, 2023Updated 2 years ago
justincui03 / tesla
View on GitHub
☆30Apr 12, 2024Updated 2 years ago
he-y / you-only-condense-once
View on GitHub
You Only Condense Once: Two Rules for Pruning Condensed Datasets (NeurIPS 2023)
☆17Nov 18, 2023Updated 2 years ago
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,053Nov 18, 2024Updated last year
NUS-HPC-AI-Lab / DD-Ranking
View on GitHub
Data distillation benchmark
☆73Jun 13, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
justincui03 / dc_benchmark
View on GitHub
☆91Jan 22, 2023Updated 3 years ago
AngusDujw / FTD-distillation
View on GitHub
The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)
☆40Mar 25, 2023Updated 3 years ago
LINs-lab / RDED
View on GitHub
[CVPR 2024] On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
☆85Feb 24, 2025Updated last year
jiahaolu97 / anything-unsegmentable
View on GitHub
(CVPR 2024) "Unsegment Anything by Simulating Deformation"
☆29May 27, 2024Updated 2 years ago
furkanbiten / stvqa_amazon_ocr
View on GitHub
STVQA and TextVQA OCR results from Amazon Text in Image pipeline
☆12Jul 18, 2022Updated 4 years ago
Yuanshi9815 / LiteFocus
View on GitHub
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
☆34Mar 11, 2025Updated last year
tmllab / 2023_ICLR_Moderate-DS
View on GitHub
☆33Mar 24, 2023Updated 3 years ago
MIV-XJTU / SPEED
View on GitHub
PyTorch implementation of paper "Sparse Parameterization for Epitomic Dataset Distillation" in NeurIPS 2023.
☆20Jun 28, 2024Updated 2 years ago
VILA-Lab / SRe2L
View on GitHub
(NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original …
☆141Nov 15, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
yu-rp / Distribution-Shift-Iverson
View on GitHub
☆42Sep 5, 2023Updated 2 years ago
Adamdad / vico
View on GitHub
Vico: Compositional Video Generation as Flow Equalization
☆59Nov 15, 2024Updated last year
NUS-HPC-AI-Lab / InfoBatch
View on GitHub
Lossless Training Speed Up by Unbiased Dynamic Data Pruning
☆347Sep 24, 2024Updated last year
florinshen / Vista3D
View on GitHub
[ECCV2024] Vista3D: Unravel the 3D Darkside of a Single Image
☆57Sep 19, 2024Updated last year
AImind / Argus-3D
View on GitHub
☆107Feb 20, 2024Updated 2 years ago
AsafShul / PoDD
View on GitHub
Official PyTorch Implementation for the "Distilling Datasets Into Less Than One Image" paper.
☆39Jun 6, 2024Updated 2 years ago
Yanqing0327 / DREAM
View on GitHub
Efficient Dataset Distillation by Representative Matching
☆114Feb 28, 2024Updated 2 years ago
horseee / CoT-Valve
View on GitHub
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆91Feb 14, 2025Updated last year
BAAI-DCAI / MMVU
View on GitHub
☆57Mar 19, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
florinshen / PlaneDreamer
View on GitHub
DreamGaussian with 2D-GS
☆12Oct 10, 2024Updated last year
yolky / RCIG
View on GitHub
☆15Apr 25, 2023Updated 3 years ago
snu-mllab / Efficient-Dataset-Condensation
View on GitHub
Official PyTorch implementation of "Dataset Condensation via Efficient Synthetic-Data Parameterization" (ICML'22)
☆115Oct 18, 2023Updated 2 years ago
PatrickZH / Awesome-Coreset-Selection
View on GitHub
Awesome coreset/core-set/subset/sample selection works.
☆184Jun 30, 2024Updated 2 years ago
magic-research / Dataset_Quantization
View on GitHub
[ICCV2023] Dataset Quantization
☆261Jan 6, 2024Updated 2 years ago
Guang000 / Awesome-Dataset-Distillation
View on GitHub
A curated list of awesome papers on dataset distillation and related applications.
☆1,970Jul 21, 2026Updated last week
AndresPMD / Fine_Grained_Clf
View on GitHub
Based on the WACV 2020 paper - Fine Grained Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
☆25Nov 15, 2021Updated 4 years ago