☆87Dec 29, 2023Updated 2 years ago
Alternatives and similar repositories for nuggets
Users that are interested in nuggets are comparing it to the libraries listed below
Sorting:
- ☆148Apr 16, 2024Updated last year
- ☆24Oct 14, 2024Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆416Jun 25, 2025Updated 8 months ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆589Dec 9, 2024Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆90Nov 13, 2024Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆189Jun 25, 2025Updated 8 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆512Oct 20, 2024Updated last year
- Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific a…☆17Dec 25, 2024Updated last year
- ☆13Jun 26, 2024Updated last year
- ☆18Jun 3, 2024Updated last year
- ☆111Jun 15, 2025Updated 8 months ago
- ☆22Oct 22, 2024Updated last year
- Code for paper 'Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning'☆18Apr 19, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Jun 26, 2024Updated last year
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆48Aug 26, 2024Updated last year
- [Findings of EMNLP'2024] Unified Active Retrieval for Retrieval Augmented Generation☆23Sep 30, 2024Updated last year
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Mar 28, 2024Updated last year
- ☆64Apr 9, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- Regularly Truncated M-estimators for Learning with Noisy Labels☆11Apr 24, 2024Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆159Nov 2, 2024Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆285Aug 20, 2023Updated 2 years ago
- [EMNLP 2023] FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models☆95Dec 21, 2023Updated 2 years ago
- [NAACL'25] RuleR: Improving LLM Controllability by Rule-based Data Recycling☆14Sep 27, 2025Updated 5 months ago
- ICCV'2023: Holistic Label Correction for Noisy Multi-Label Classification☆13Oct 29, 2023Updated 2 years ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data☆13Jul 21, 2024Updated last year
- ☆13Jan 22, 2025Updated last year
- [CIKM 2025] Constraint Back-translation Improves Complex Instruction Following of Large Language Models☆17May 23, 2025Updated 9 months ago
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆55Oct 29, 2024Updated last year
- ☆31Mar 23, 2024Updated last year
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆12Dec 13, 2023Updated 2 years ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆37Oct 3, 2025Updated 5 months ago
- ☆62Oct 29, 2024Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆30Nov 24, 2024Updated last year