YU-deep / MACTLinks
☆18Updated 6 months ago
Alternatives and similar repositories for MACT
Users that are interested in MACT are comparing it to the libraries listed below
Sorting:
- [ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow☆35Updated 4 months ago
- ☆66Updated this week
- ☆41Updated last month
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Updated 4 months ago
- ☆13Updated last year
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆19Updated 7 months ago
- Official Implementation for *PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling*☆31Updated last month
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Updated last month
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Updated 7 months ago
- UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation☆22Updated 8 months ago
- ☆19Updated 2 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Updated 2 months ago
- ☆13Updated last year
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Updated 7 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Updated last year
- [Preprint] Efficient Generative Model Training via Embedded Representation Warmup☆36Updated 3 months ago
- Training Autoregressive Image Generation models via Reinforcement Learning☆50Updated 2 months ago
- ☆18Updated 7 months ago
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆24Updated this week
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Updated 11 months ago
- ☆12Updated 7 months ago
- EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling☆209Updated this week
- ☆11Updated 2 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆79Updated 2 months ago
- [ICCV 2025] Diffusion Curriculum (DisCL)☆17Updated 4 months ago
- [ICLR'25] Official repository of paper: Ranking-aware adapter for text-driven image ordering with CLIP☆16Updated 9 months ago
- The code repository of UniRL☆51Updated 8 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Updated last year
- [ICCV25] TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers☆40Updated 6 months ago
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]☆118Updated this week