[ICLR'25] Reconstructive Visual Instruction Tuning
☆135Apr 9, 2025Updated 10 months ago
Alternatives and similar repositories for ross
Users that are interested in ross are comparing it to the libraries listed below
Sorting:
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆60Nov 27, 2025Updated 3 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆67Jul 22, 2025Updated 7 months ago
- ☆54Jan 17, 2025Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆426Dec 22, 2024Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆299Jan 23, 2025Updated last year
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆438Aug 8, 2025Updated 6 months ago
- A fork to add multimodal model training to open-r1☆1,493Feb 8, 2025Updated last year
- The official repo for the DanQing dataset.☆30Jan 16, 2026Updated last month
- ☆132Mar 22, 2025Updated 11 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆800Oct 10, 2025Updated 4 months ago
- CaptionQA: Is Your Caption as Useful as the Image Itself?☆32Jan 19, 2026Updated last month
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆75Jan 26, 2026Updated last month
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆164Nov 6, 2024Updated last year
- ☆46Feb 18, 2026Updated last week
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆90Oct 12, 2024Updated last year
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆349Jan 8, 2026Updated last month
- Official repo and evaluation implementation of VSI-Bench☆673Aug 5, 2025Updated 6 months ago
- Code for Cross-dataset Training☆15Dec 27, 2020Updated 5 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆226Mar 20, 2025Updated 11 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆234Jan 22, 2026Updated last month
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆278Nov 6, 2025Updated 3 months ago
- (Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators☆640Nov 10, 2025Updated 3 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Jul 27, 2025Updated 7 months ago
- ☆46Dec 30, 2024Updated last year
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- [ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation☆33Mar 3, 2025Updated 11 months ago
- LEO: A powerful Hybrid Multimodal LLM☆19Jan 18, 2025Updated last year
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,875Jan 8, 2026Updated last month
- ☆68Nov 5, 2025Updated 3 months ago
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆108Sep 27, 2025Updated 5 months ago
- Unofficial implementation for SOLOv2 instance segmentation☆15Jun 13, 2020Updated 5 years ago
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆27Aug 7, 2025Updated 6 months ago
- ☆4,577Sep 14, 2025Updated 5 months ago
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆279Jan 16, 2025Updated last year
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generat…☆245Oct 12, 2025Updated 4 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)☆245Apr 24, 2025Updated 10 months ago