AnonymousDUTAI / SREKCARC-IA-TUDLinks
☆20Updated last year
Alternatives and similar repositories for SREKCARC-IA-TUD
Users that are interested in SREKCARC-IA-TUD are comparing it to the libraries listed below
Sorting:
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆127Updated 2 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆225Updated this week
- Official implementation of MC-LLaVA.☆139Updated last month
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆304Updated 5 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆193Updated 2 months ago
- Fundamentals of Digital Media Technology(04713901) | Peking University ECE Course Materials☆23Updated 3 years ago
- A tiny paper rating web☆38Updated 9 months ago
- [ACMMM 2025 - Dataset Track] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆20Updated 6 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆243Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆84Updated 4 months ago
- ☆55Updated 4 months ago
- A collection of vision foundation models unifying understanding and generation.☆59Updated 11 months ago
- This is a collection of recent papers on reasoning in video generation models.☆83Updated this week
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆35Updated 8 months ago
- ☆23Updated 3 weeks ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated 6 months ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆114Updated 5 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆339Updated 2 months ago
- [arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning☆34Updated last month
- Code for paper: Reinforced Vision Perception with Tools☆65Updated 2 months ago
- A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines☆31Updated 3 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆151Updated 3 months ago
- A collection of awesome think with videos papers.☆73Updated 2 weeks ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆91Updated 3 weeks ago
- ☆60Updated 5 months ago
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆142Updated this week
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps☆23Updated last month
- ☆152Updated 3 weeks ago
- Cambrian-S: Towards Spatial Supersensing in Video☆429Updated this week
- Provide .bst files for NeurIPS latex template☆49Updated 8 months ago