Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆90Nov 13, 2024Updated last year
Alternatives and similar repositories for CaR
Users that are interested in CaR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆149Apr 16, 2024Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆188Jun 25, 2025Updated 9 months ago
- ☆88Dec 29, 2023Updated 2 years ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆416Jun 25, 2025Updated 9 months ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆591Dec 9, 2024Updated last year
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆86Dec 14, 2023Updated 2 years ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆519Oct 20, 2024Updated last year
- [ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"☆36Feb 10, 2026Updated last month
- Unofficial implementation of AlpaGasus☆95Sep 23, 2023Updated 2 years ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆285Aug 20, 2023Updated 2 years ago
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆25Nov 16, 2023Updated 2 years ago
- ☆24Oct 14, 2024Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆194Mar 25, 2024Updated 2 years ago
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆52Dec 21, 2023Updated 2 years ago
- A better Alpaca Model Trained with Less Data (only 9k instructions of the original set)☆24Jul 26, 2024Updated last year
- ☆28Jan 4, 2026Updated 2 months ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated 2 years ago
- ☆16Mar 22, 2024Updated 2 years ago
- An Experiment on Dynamic NTK Scaling RoPE☆64Nov 26, 2023Updated 2 years ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆186Feb 17, 2025Updated last year
- Official repository for "EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scena…☆19May 28, 2025Updated 9 months ago
- Recursive Abstractive Processing for Tree-Organized Retrieval☆10May 30, 2024Updated last year
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Apr 12, 2024Updated last year
- ☆12Dec 13, 2023Updated 2 years ago
- The source code used for paper "Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts", in WSDM 2023.☆14May 27, 2023Updated 2 years ago
- Entity-Aware Dual Co-Attention Network for Fake News Detection, EACL 2023 Findings☆10Jun 11, 2023Updated 2 years ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆168Nov 12, 2023Updated 2 years ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆49Mar 7, 2024Updated 2 years ago
- ☆15Oct 20, 2023Updated 2 years ago
- ☆43Mar 6, 2025Updated last year
- 大创项目,层级注意力机器翻译☆17Apr 12, 2021Updated 4 years ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆20Sep 27, 2025Updated 5 months ago
- ☆21Mar 1, 2025Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆162Nov 2, 2024Updated last year
- 面向大模型的民族文化数据集☆12May 26, 2025Updated 9 months ago
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…☆53Nov 5, 2024Updated last year
- [ACL 2024 Findings] Learning Fine-Grained Grounded Citations for Attributed Large Language Models☆20Oct 24, 2024Updated last year
- 阿里天池: 2023全球智能汽车AI挑战赛——赛道一:AI大模型检索问答 baseline 80+☆121Dec 28, 2023Updated 2 years ago