AnonymousDUTAI / SREKCARC-IA-TUDLinks
☆20Updated last year
Alternatives and similar repositories for SREKCARC-IA-TUD
Users that are interested in SREKCARC-IA-TUD are comparing it to the libraries listed below
Sorting:
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆129Updated 2 weeks ago
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆313Updated 6 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆202Updated 3 months ago
- A tiny paper rating web☆38Updated 9 months ago
- Official implementation of MC-LLaVA.☆140Updated 2 months ago
- This is a collection of recent papers on reasoning in video generation models.☆91Updated last week
- A collection of vision foundation models unifying understanding and generation.☆59Updated last year
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆152Updated 4 months ago
- ☆59Updated 6 months ago
- ☆51Updated 4 months ago
- [ACMMM 2025 - Dataset Track] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆21Updated 6 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆234Updated this week
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated this week
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models☆37Updated 2 months ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆116Updated 6 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆468Updated 2 weeks ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆33Updated 7 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆105Updated last month
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆134Updated last week
- A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines☆31Updated 4 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆143Updated last year
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆102Updated 2 weeks ago
- This repo is the official implementation of "Euclid’s Gift: Enhancing Spatial Perception and Reasoning in Vision‑Language Models via Geom…☆25Updated 2 months ago
- ☆57Updated 4 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆36Updated 9 months ago
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆166Updated last week
- [NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"☆36Updated 2 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆137Updated 7 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 5 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆85Updated 5 months ago