AnonymousDUTAI / SREKCARC-IA-TUDLinks
☆20Updated 10 months ago
Alternatives and similar repositories for SREKCARC-IA-TUD
Users that are interested in SREKCARC-IA-TUD are comparing it to the libraries listed below
Sorting:
- A collection of vision foundation models unifying understanding and generation.☆57Updated 7 months ago
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆113Updated last month
- Survey: https://arxiv.org/pdf/2507.20198☆69Updated this week
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆277Updated last month
- Official implementation of MC-LLaVA.☆130Updated 2 months ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆97Updated last month
- ☆37Updated last week
- Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆73Updated 2 months ago
- ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆16Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆55Updated 2 weeks ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆268Updated last week
- A tiny paper rating web☆39Updated 4 months ago
- A paper list for spatial reasoning☆129Updated 2 months ago
- Fundamentals of Digital Media Technology(04713901) | Peking University ECE Course Materials☆18Updated 3 years ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆117Updated 9 months ago
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆39Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆103Updated 2 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆35Updated 8 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆79Updated 3 weeks ago
- Collected the world's best computer vision labs and lecture materials.☆14Updated 5 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆143Updated this week
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆151Updated 2 months ago
- ☆31Updated last month
- ☆59Updated last month
- Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆131Updated last month
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆68Updated 2 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆27Updated 4 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆71Updated last month
- TStar is a unified temporal search framework for long-form video question answering☆59Updated 4 months ago
- ☆99Updated 4 months ago