christian42mmreason / ActivationReplayLinks
☆19Updated last month
Alternatives and similar repositories for ActivationReplay
Users that are interested in ActivationReplay are comparing it to the libraries listed below
Sorting:
- ☆64Updated 2 months ago
- [ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow☆34Updated 3 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆139Updated last month
- [NeurIPS 2025 Spotlight] VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.☆151Updated 3 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆108Updated 4 months ago
- [NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning☆61Updated 3 weeks ago
- Training Autoregressive Image Generation models via Reinforcement Learning☆49Updated 2 months ago
- ☆53Updated 10 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆100Updated 6 months ago
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆104Updated 9 months ago
- Official implement of MIA-DPO☆70Updated last year
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆182Updated 2 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆85Updated last week
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆78Updated 2 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆185Updated 8 months ago
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Updated last month
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆114Updated 6 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆42Updated 9 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆235Updated 5 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆42Updated 3 months ago
- EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling☆207Updated 2 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆124Updated 3 months ago
- Unified Multi-modal IAA Baseline and Benchmark☆92Updated last year
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Updated last year
- The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"☆17Updated 10 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆78Updated 4 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆55Updated 10 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated last week
- WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction☆57Updated 4 months ago
- Assessing Context-Aware Creative Intelligence in MLLMs☆23Updated 6 months ago