Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
☆34Jul 12, 2024Updated last year
Alternatives and similar repositories for AlignGPT
Users that are interested in AlignGPT are comparing it to the libraries listed below
Sorting:
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆20Jul 3, 2025Updated 8 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆35Jul 15, 2025Updated 7 months ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆29Dec 24, 2025Updated 2 months ago
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆91Apr 30, 2024Updated last year
- VHTest☆15Oct 31, 2024Updated last year
- ☆16Apr 30, 2024Updated last year
- ☆18Oct 28, 2025Updated 4 months ago
- ☆16Jul 23, 2024Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research☆35Nov 26, 2025Updated 3 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆36Nov 27, 2025Updated 3 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 7 months ago
- [CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering☆25Apr 24, 2025Updated 10 months ago
- The official implementation of the ECCV'24 paper MC-CoT: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models w…☆26May 19, 2024Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Jun 10, 2025Updated 8 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 24, 2026Updated last week
- [CVPR 2025] CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning☆37Apr 21, 2025Updated 10 months ago
- [CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…☆32Nov 25, 2025Updated 3 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 8 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆220Mar 31, 2025Updated 11 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- Expert-level AI radiology report evaluator☆36Apr 1, 2025Updated 11 months ago
- This repository is aim to reproduce the R1-Zero on medical domain.☆32Jun 11, 2025Updated 8 months ago
- [ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging☆39Jun 4, 2025Updated 9 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆39Jun 14, 2025Updated 8 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆86Oct 25, 2024Updated last year
- Teeth Segmentation☆11Apr 14, 2024Updated last year
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 5 months ago
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆29Oct 23, 2025Updated 4 months ago
- PyTorch implementation for "Rethinking Low-quality Optical Flow in Unsupervised Surgical Instrument Segmentation"☆10Apr 11, 2024Updated last year
- ☆11Jun 22, 2025Updated 8 months ago
- [3D GeoInfo 2025] RoofSense: A Multimodal Semantic Segmentation Dataset for Roofing Material Classification☆10Jul 3, 2025Updated 8 months ago