This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
☆122Jun 18, 2025Updated last year
Alternatives and similar repositories for Vision-LLM-Alignment
Users that are interested in Vision-LLM-Alignment are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).☆21Jul 2, 2024Updated 2 years ago
- Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation☆30Jun 30, 2025Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- A RLHF Infrastructure for Vision-Language Models☆201Nov 15, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A repository used to organize content related to Large Speech(Audio) Model, including paper, data, applications, tools and so on.☆28Nov 8, 2025Updated 7 months ago
- A list of conferences and journals relevant to machine translation☆33Mar 17, 2022Updated 4 years ago
- Building a inclusive, scalable, and high-performance multilingual translation model☆126May 7, 2026Updated last month
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆63Aug 23, 2024Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆53Oct 19, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆93Apr 30, 2024Updated 2 years ago
- The project for speech translation☆12Sep 28, 2023Updated 2 years ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).☆87Jun 2, 2021Updated 5 years ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆22Jan 11, 2026Updated 5 months ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,428May 11, 2026Updated last month
- A new multi-task learning framework using Vision Transformers☆11Jun 19, 2024Updated 2 years ago
- A benchmark for testing memorization abilities of LMs☆24Oct 15, 2024Updated last year
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆52Dec 21, 2023Updated 2 years ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68May 31, 2024Updated 2 years ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆383Updated this week
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆308May 21, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Oct 11, 2024Updated last year
- Explore the Multimodal “Aha Moment” on 2B Model☆624Mar 18, 2025Updated last year
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 8 months ago
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆12Aug 11, 2025Updated 10 months ago
- ☆46Dec 30, 2024Updated last year
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models☆15Nov 1, 2024Updated last year
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated 2 years ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆104Jan 30, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024☆22Feb 15, 2024Updated 2 years ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated 2 years ago
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆218Sep 26, 2025Updated 9 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆309Sep 11, 2024Updated last year
- Aligning LMMs with Factually Augmented RLHF☆396Nov 1, 2023Updated 2 years ago
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos☆24May 7, 2026Updated last month
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year