AnonymousDUTAI / SREKCARC-IA-TUDLinks
☆20Updated last year
Alternatives and similar repositories for SREKCARC-IA-TUD
Users that are interested in SREKCARC-IA-TUD are comparing it to the libraries listed below
Sorting:
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆128Updated last month
- Official implementation of MC-LLaVA.☆140Updated 2 months ago
- This is a collection of recent papers on reasoning in video generation models.☆95Updated 3 weeks ago
- A tiny paper rating web☆39Updated 10 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆236Updated 3 weeks ago
- Fundamentals of Digital Media Technology(04713901) | Peking University ECE Course Materials☆23Updated 3 years ago
- Towards Efficient Multimodal Large Language Models: A Survey on Token Compression☆78Updated 2 weeks ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago
- A collection of vision foundation models unifying understanding and generation.☆59Updated last year
- [TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198☆285Updated this week
- A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines☆31Updated 4 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆34Updated 7 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- ☆58Updated 6 months ago
- [ACMMM 2025 - Dataset Track] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆22Updated 7 months ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆118Updated 7 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆113Updated last month
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- Using message app/bot to notify you when doing time-consuming tasks. Bake your experiments!☆85Updated last week
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆59Updated 2 months ago
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆314Updated 6 months ago
- ☆51Updated 5 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆79Updated 3 months ago
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆183Updated this week
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence☆54Updated 3 weeks ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆347Updated 3 weeks ago
- A collection of awesome think with videos papers.☆83Updated 2 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆37Updated 10 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆148Updated last year