Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆90Nov 13, 2024Updated last year
Alternatives and similar repositories for CaR
Users that are interested in CaR are comparing it to the libraries listed below
Sorting:
- ☆148Apr 16, 2024Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆189Jun 25, 2025Updated 8 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆416Jun 25, 2025Updated 8 months ago
- ☆87Dec 29, 2023Updated 2 years ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆589Dec 9, 2024Updated last year
- ☆12Jun 30, 2024Updated last year
- The source code used for paper "Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts", in WSDM 2023.☆15May 27, 2023Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆86Dec 14, 2023Updated 2 years ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆512Oct 20, 2024Updated last year
- ☆24Oct 14, 2024Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- Code for Robust Fine-tuning (RbFT)☆17Jan 31, 2025Updated last year
- 天池算法比赛《BetterMixture - 大模型数据混合挑战赛》的第一名top1解决方案☆34Jul 7, 2024Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆193Mar 25, 2024Updated last year
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated last year
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Apr 12, 2024Updated last year
- An Experiment on Dynamic NTK Scaling RoPE☆64Nov 26, 2023Updated 2 years ago
- ☆34Dec 18, 2025Updated 2 months ago
- [ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"☆36Feb 10, 2026Updated 3 weeks ago
- Replication code for "The Structure of Toxic Conversations on Twitter" (WWW'21)☆10May 25, 2021Updated 4 years ago
- 通用简单工具项目☆22Oct 6, 2024Updated last year
- ☆13Jan 22, 2025Updated last year
- ☆13Dec 13, 2023Updated 2 years ago
- ☆26Jan 4, 2026Updated 2 months ago
- Serial Contrastive Knowledge Distillation for Continual Few-shot Relation Extraction, Findings of ACL 2023☆13May 12, 2023Updated 2 years ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Apr 9, 2024Updated last year
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆184Feb 17, 2025Updated last year
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…☆53Nov 5, 2024Updated last year
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- Diverse Demonstrations Improve In-context Compositional Generalization☆12Jul 7, 2023Updated 2 years ago
- Recursive Abstractive Processing for Tree-Organized Retrieval☆10May 30, 2024Updated last year
- Github repo for Peifeng's internship project☆13Nov 7, 2023Updated 2 years ago
- (NBCE)Naive Bayes-based Context Extension on ChatGLM-6b☆15Jun 7, 2023Updated 2 years ago
- ☆15Oct 20, 2023Updated 2 years ago
- ☆11Nov 17, 2022Updated 3 years ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- A fork of the PEFT library, supporting Robust Adaptation (RoSA)☆15Aug 16, 2024Updated last year
- 阿里天池: 2023全球智能汽车AI挑战赛——赛道一:AI大模型检索问答 baseline 80+☆119Dec 28, 2023Updated 2 years ago
- ☆42Mar 6, 2025Updated 11 months ago