☆44Oct 13, 2023Updated 2 years ago
Alternatives and similar repositories for d2pruning
Users that are interested in d2pruning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data Valuation without Training of a Model, submitted to ICLR'23☆22Dec 30, 2022Updated 3 years ago
- ☆42Sep 21, 2023Updated 2 years ago
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆58Apr 24, 2023Updated 3 years ago
- [ICLR 2024] "Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality" by Xuxi Chen*, Yu Yang*, Zhangyang Wang, Baha…☆15May 18, 2024Updated 2 years ago
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆51Jun 30, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆10Sep 13, 2022Updated 3 years ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- [NeurIPS 2024 Spotlight] CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning.☆14Dec 12, 2024Updated last year
- ☆10Feb 6, 2025Updated last year
- DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models☆13Nov 2, 2023Updated 2 years ago
- ☆32Mar 24, 2023Updated 3 years ago
- Code and data from the paper 'Human Feedback is not Gold Standard'☆20May 5, 2026Updated 2 weeks ago
- ☆27Mar 21, 2024Updated 2 years ago
- Implementation of Gradient Information Optimization (GIO) for effective and scalable training data selection☆14Jun 22, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆16Dec 21, 2023Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of "HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts"☆20Nov 22, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 9 months ago
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆28Oct 14, 2025Updated 7 months ago
- Code for "Improving Translation Faithfulness of Large Language Models via Augmenting Instructions"☆12Aug 26, 2023Updated 2 years ago
- code for promptCSE, emnlp 2022☆11Apr 10, 2023Updated 3 years ago
- ☆18Mar 23, 2025Updated last year
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆16Jun 3, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official PyTorch implementation of "Loss-Curvature Matching for Dataset Selection and Condensation" (AISTATS 2023)☆22Mar 14, 2023Updated 3 years ago
- ☆11Feb 28, 2024Updated 2 years ago
- ☆27Jul 10, 2025Updated 10 months ago
- ☆53Jan 24, 2024Updated 2 years ago
- 用强化学习来玩微信跳一跳☆12Jul 10, 2022Updated 3 years ago
- 逻辑回归和单层softmax的解析解☆12Jul 29, 2021Updated 4 years ago
- [NeurIPS 2023] Towards Free Data Selection with General-Purpose Models☆42Mar 14, 2025Updated last year
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆26Nov 16, 2023Updated 2 years ago
- ☆15Apr 13, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- You Only Condense Once: Two Rules for Pruning Condensed Datasets (NeurIPS 2023)☆16Nov 18, 2023Updated 2 years ago
- Official Repository of paper MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Pol…☆74Jan 26, 2026Updated 3 months ago
- A Survey on Data Selection for Language Models☆259Apr 29, 2025Updated last year
- ☆30Apr 12, 2024Updated 2 years ago
- Awesome-open-world-learning☆26Oct 19, 2021Updated 4 years ago
- Bridging Large Language Models with Scala 3 Functions☆11Aug 31, 2024Updated last year
- Papers about training data quality management for ML models.☆120May 8, 2026Updated 2 weeks ago