This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
☆120Jun 18, 2025Updated 9 months ago
Alternatives and similar repositories for Vision-LLM-Alignment
Users that are interested in Vision-LLM-Alignment are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).☆21Jul 2, 2024Updated last year
- Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation☆28Jun 30, 2025Updated 9 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- A RLHF Infrastructure for Vision-Language Models☆198Nov 15, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A repository used to organize content related to Large Speech(Audio) Model, including paper, data, applications, tools and so on.☆28Nov 8, 2025Updated 5 months ago
- A list of conferences and journals relevant to machine translation☆33Mar 17, 2022Updated 4 years ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆62Aug 23, 2024Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆53Oct 19, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆93Apr 30, 2024Updated last year
- The project for speech translation☆12Sep 28, 2023Updated 2 years ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).☆88Jun 2, 2021Updated 4 years ago
- A benchmark for testing memorization abilities of LMs☆22Oct 15, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆22Jan 11, 2026Updated 3 months ago
- A new multi-task learning framework using Vision Transformers☆11Jun 19, 2024Updated last year
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,394Feb 26, 2026Updated last month
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆52Dec 21, 2023Updated 2 years ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆383Feb 23, 2025Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆310May 21, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Oct 11, 2024Updated last year
- Explore the Multimodal “Aha Moment” on 2B Model☆622Mar 18, 2025Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 5 months ago
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆12Aug 11, 2025Updated 8 months ago
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models☆15Nov 1, 2024Updated last year
- ☆48Dec 30, 2024Updated last year
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- A tool for translating the content of LaTeX documents into various other natural languages (e.g., translating an arXiv paper from English…☆454Mar 12, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated last year
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆218Sep 26, 2025Updated 6 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆307Sep 11, 2024Updated last year
- Aligning LMMs with Factually Augmented RLHF☆394Nov 1, 2023Updated 2 years ago
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos☆23Jan 26, 2026Updated 2 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆449May 14, 2025Updated 11 months ago