christian42mmreason / ActivationReplayLinks
☆19Updated last month
Alternatives and similar repositories for ActivationReplay
Users that are interested in ActivationReplay are comparing it to the libraries listed below
Sorting:
- ☆56Updated last month
- ☆34Updated 3 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆134Updated last week
- [NeurIPS 2025 Spotlight] VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.☆141Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆176Updated 2 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆108Updated 3 months ago
- EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling☆191Updated last month
- Unified Multi-modal IAA Baseline and Benchmark☆91Updated last year
- Training Autoregressive Image Generation models via Reinforcement Learning☆48Updated last month
- ☆53Updated 9 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆78Updated 3 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆78Updated last month
- Official implement of MIA-DPO☆69Updated 11 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆184Updated 7 months ago
- The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"☆17Updated 9 months ago
- Assessing Context-Aware Creative Intelligence in MLLMs☆23Updated 5 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆55Updated 9 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆62Updated 10 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆114Updated 5 months ago
- ☆138Updated last year
- ☆18Updated last year
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆83Updated 3 months ago
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆97Updated 8 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆98Updated 5 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆235Updated 4 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆137Updated 7 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆119Updated 2 months ago
- [ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs☆19Updated last month
- ④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a bench…☆86Updated last year
- [NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆68Updated 2 months ago