IronBeliever/CaR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IronBeliever/CaR)

IronBeliever / CaR

Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

☆91

Alternatives and similar repositories for CaR

Users that are interested in CaR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CASIA-LM / MoDS
View on GitHub
☆153Apr 16, 2024Updated 2 years ago
NiuTrans / ODEs-in-Vision-and-Language
View on GitHub
An introduction to ODEs and their applications in vision and language
☆15Feb 26, 2026Updated 4 months ago
tianyi-lab / Superfiltering
View on GitHub
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆189Jun 25, 2025Updated last year
pldlgb / nuggets
View on GitHub
☆89Dec 29, 2023Updated 2 years ago
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆416Jun 25, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆600Dec 9, 2024Updated last year
NiuTrans / GRAM
View on GitHub
Code for ICML 2025 paper "GRAM: A Generative Foundation Reward Model for Reward Generalization"
☆21Sep 4, 2025Updated 10 months ago
xypan0 / G-DIG
View on GitHub
☆12Jun 30, 2024Updated 2 years ago
OFA-Sys / DiverseEvol
View on GitHub
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
☆88Dec 14, 2023Updated 2 years ago
Blue-Raincoat / SelectIT
View on GitHub
☆24Oct 14, 2024Updated last year
princeton-nlp / LESS
View on GitHub
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆532Oct 20, 2024Updated last year
NiuTrans / MTVenues
View on GitHub
A list of conferences and journals relevant to machine translation
☆33Mar 17, 2022Updated 4 years ago
gpt4life / alpagasus
View on GitHub
Unofficial implementation of AlpaGasus
☆94Sep 23, 2023Updated 2 years ago
facebookresearch / dual-system-for-visual-language-reasoning
View on GitHub
Github repo for Peifeng's internship project
☆13Nov 7, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
OFA-Sys / InsTag
View on GitHub
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆287Aug 20, 2023Updated 2 years ago
heyblackC / BetterMixture-Top1-Solution
View on GitHub
天池算法比赛《BetterMixture - 大模型数据混合挑战赛》的第一名top1解决方案
☆33Jul 7, 2024Updated 2 years ago
UmeanNever / NovelSum
View on GitHub
[ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"
☆42Feb 10, 2026Updated 5 months ago
yichengchen24 / MIG
View on GitHub
[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Spac…
☆28Aug 30, 2025Updated 10 months ago
SqueezeAILab / LLM2LLM
View on GitHub
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
☆196Mar 25, 2024Updated 2 years ago
NiuTrans / Introduction-to-Transformers
View on GitHub
An introduction to basic concepts of Transformers and key techniques of their recent advances.
☆53Dec 21, 2023Updated 2 years ago
NormXU / Consistent-DynamicNTKRoPE
View on GitHub
An Experiment on Dynamic NTK Scaling RoPE
☆65Nov 26, 2023Updated 2 years ago
sail-sg / regmix
View on GitHub
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆194Feb 17, 2025Updated last year
itayle / diverse-demonstrations
View on GitHub
Diverse Demonstrations Improve In-context Compositional Generalization
☆13Jul 7, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LeeeeoLiu / LLM-CRS
View on GitHub
☆12Dec 13, 2023Updated 2 years ago
SinHanYang / Dual-CAN
View on GitHub
Entity-Aware Dual Co-Attention Network for Fake News Detection, EACL 2023 Findings
☆10Jun 11, 2023Updated 3 years ago
zexuanqiu / CLongEval
View on GitHub
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
☆49Mar 7, 2024Updated 2 years ago
tianyi-lab / Reflection_Tuning
View on GitHub
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
☆368Sep 6, 2024Updated last year
snowood1 / BERT-ENN
View on GitHub
Uncertainty-Aware Reliable Text Classification (KDD 2021)
☆18Oct 4, 2022Updated 3 years ago
oriyor / reasoning-on-cots
View on GitHub
Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"
☆97Jan 21, 2024Updated 2 years ago
yangjianxin1 / LongQLoRA
View on GitHub
LongQLoRA: Extent Context Length of LLMs Efficiently
☆170Nov 12, 2023Updated 2 years ago
DUTIR-Emotion-Group / CCL2025-Chinese-Hate-Speech-Detection
View on GitHub
☆22Mar 1, 2025Updated last year
libeineu / SDT-Training
View on GitHub
The implementation of "Shallow-to-Deep Training for Neural Machine Translation"
☆10Oct 26, 2020Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
sail-sg / sdft
View on GitHub
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
☆167Nov 2, 2024Updated last year
fairyshine / Seal-Tools
View on GitHub
The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…
☆57Nov 5, 2024Updated last year
tianyi-lab / Mosaic-IT
View on GitHub
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆20Sep 27, 2025Updated 9 months ago
BatsResearch / nayak-aclfindings24-code
View on GitHub
☆22Jul 16, 2024Updated 2 years ago
shuoli90 / Rank-Calibration
View on GitHub
This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.
☆14Apr 9, 2024Updated 2 years ago
msaveski / toxic_conversation_structure
View on GitHub
Replication code for "The Structure of Toxic Conversations on Twitter" (WWW'21)
☆10May 25, 2021Updated 5 years ago
OFA-Sys / gsm8k-ScRel
View on GitHub
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆269Sep 12, 2024Updated last year