yichengchen24 / MIGLinks

[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

☆25

Alternatives and similar repositories for MIG

Users that are interested in MIG are comparing it to the libraries listed below

Sorting:

InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆159Updated last month
ECNU-ICALK / EduChat-Math
[MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
☆44Updated last year
yegcjs / mixinglaws
☆107Updated 3 months ago
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆166Updated last year
sail-sg / regmix
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆176Updated 8 months ago
SkyworkAI / Skywork-Reward-V2
Scaling Preference Data Curation via Human-AI Synergy
☆125Updated 4 months ago
SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆46Updated 2 years ago
FuxiaoLiu / MMC
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
☆96Updated 10 months ago
haon-chen / mmE5
☆54Updated 8 months ago
mayubo2333 / MMLongBench-Doc
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆102Updated last month
CMMMU-Benchmark / CMMMU
☆48Updated last year
YunxinLi / LingCloud
Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…
☆48Updated last year
OpenMatch / MARVEL
[ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…
☆38Updated last year
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆192Updated last year
QwenLM / online_merging_optimizers
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆79Updated last year
vlf-silkie / VLFeedback
☆100Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
JUNJIE99 / VISTA_Evaluation_FineTuning
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…
☆43Updated 11 months ago
pengr / DataMan
Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".
☆107Updated 2 months ago
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆181Updated 4 months ago
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆146Updated 4 months ago
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆101Updated 5 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆91Updated last year
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆96Updated last year
multimodal-art-projection / COIG-P
☆39Updated 3 months ago
TsinghuaC3I / Intuitive-Fine-Tuning
[ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
☆30Updated last year
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
yhy-2000 / VideoDeepResearch
☆116Updated 2 weeks ago
thunlp / Muffin
☆66Updated last year
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆57Updated 10 months ago