Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆91Nov 13, 2024Updated last year
Alternatives and similar repositories for CaR
Users that are interested in CaR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆89Dec 29, 2023Updated 2 years ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆415Jun 25, 2025Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆597Dec 9, 2024Updated last year
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12Jun 30, 2024Updated 2 years ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆528Oct 20, 2024Updated last year
- A list of conferences and journals relevant to machine translation☆33Mar 17, 2022Updated 4 years ago
- Unofficial implementation of AlpaGasus☆94Sep 23, 2023Updated 2 years ago
- [ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"☆41Feb 10, 2026Updated 4 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆286Aug 20, 2023Updated 2 years ago
- 天池算法比赛《BetterMixture - 大模型数据混合挑战赛》的第一名top1解决方案☆33Jul 7, 2024Updated last year
- ☆24Oct 14, 2024Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆197Mar 25, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆52Dec 21, 2023Updated 2 years ago
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆122Jun 18, 2025Updated last year
- Github repo for Peifeng's internship project☆13Nov 7, 2023Updated 2 years ago
- A better Alpaca Model Trained with Less Data (only 9k instructions of the original set)☆24Jul 26, 2024Updated last year
- [ACL 2026] R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning☆35Jan 4, 2026Updated 5 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆65Nov 26, 2023Updated 2 years ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)