☆44Oct 13, 2023Updated 2 years ago
Alternatives and similar repositories for d2pruning
Users that are interested in d2pruning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for paper: “What Data Benefits My Classifier?” Enhancing Model Performance and Interpretability through Influence-Based Data Selecti…☆23May 17, 2024Updated last year
- Data Valuation without Training of a Model, submitted to ICLR'23☆22Dec 30, 2022Updated 3 years ago
- ☆42Sep 21, 2023Updated 2 years ago
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆58Apr 24, 2023Updated 2 years ago
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- ☆11Dec 20, 2020Updated 5 years ago
- [ICLR 2024] "Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality" by Xuxi Chen*, Yu Yang*, Zhangyang Wang, Baha…☆15May 18, 2024Updated last year
- ☆13Dec 12, 2025Updated 3 months ago
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆50Jun 30, 2025Updated 8 months ago
- ☆10Sep 13, 2022Updated 3 years ago
- A Survey of Dataset Refinement for Problems in Computer Vision Datasets☆34Sep 12, 2025Updated 6 months ago
- ☆10Feb 6, 2025Updated last year
- DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models☆13Nov 2, 2023Updated 2 years ago
- ☆32Mar 24, 2023Updated 2 years ago
- ☆27Mar 21, 2024Updated 2 years ago
- Code and data from the paper 'Human Feedback is not Gold Standard'☆20Mar 6, 2026Updated 2 weeks ago
- Implementation of Gradient Information Optimization (GIO) for effective and scalable training data selection☆14Jun 22, 2023Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of "HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts"☆20Nov 22, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆13Aug 8, 2025Updated 7 months ago
- The official implementation of paper "Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning" (CVPR …☆22Aug 20, 2024Updated last year
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 5 months ago
- Code for "Improving Translation Faithfulness of Large Language Models via Augmenting Instructions"☆12Aug 26, 2023Updated 2 years ago
- ☆17Mar 23, 2025Updated last year
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆15Jun 3, 2023Updated 2 years ago
- Official PyTorch implementation of "Loss-Curvature Matching for Dataset Selection and Condensation" (AISTATS 2023)☆22Mar 14, 2023Updated 3 years ago
- AAAI 2024, M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy☆25Mar 2, 2024Updated 2 years ago
- Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation (CVPR24)☆11Jun 16, 2024Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆232Nov 18, 2024Updated last year
- ☆11Feb 28, 2024Updated 2 years ago
- ☆51Jan 24, 2024Updated 2 years ago
- 逻辑回归和单层softmax的解析解☆12Jul 29, 2021Updated 4 years ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"☆16Aug 11, 2023Updated 2 years ago
- ☆32May 24, 2023Updated 2 years ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆25Nov 16, 2023Updated 2 years ago
- ☆15Apr 13, 2023Updated 2 years ago
- You Only Condense Once: Two Rules for Pruning Condensed Datasets (NeurIPS 2023)☆15Nov 18, 2023Updated 2 years ago
- ☆22Jul 20, 2022Updated 3 years ago
- A Survey on Data Selection for Language Models☆255Apr 29, 2025Updated 10 months ago