yichengchen24 / MIGLinks
[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
☆25Updated 2 months ago
Alternatives and similar repositories for MIG
Users that are interested in MIG are comparing it to the libraries listed below
Sorting:
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆159Updated last month
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆44Updated last year
- ☆107Updated 3 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆166Updated last year
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆176Updated 8 months ago
- Scaling Preference Data Curation via Human-AI Synergy☆125Updated 4 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆46Updated 2 years ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆96Updated 10 months ago
- ☆54Updated 8 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆102Updated last month
- ☆48Updated last year
- Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…☆48Updated last year
- [ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…☆38Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆192Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆79Updated last year
- ☆100Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆43Updated 11 months ago
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".☆107Updated 2 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆181Updated 4 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆146Updated 4 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆101Updated 5 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆91Updated last year
- Official repository of MMDU dataset☆96Updated last year
- ☆39Updated 3 months ago
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- ☆116Updated 2 weeks ago
- ☆66Updated last year
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆57Updated 10 months ago