AnonymousDUTAI / SREKCARC-IA-TUDLinks
☆20Updated last year
Alternatives and similar repositories for SREKCARC-IA-TUD
Users that are interested in SREKCARC-IA-TUD are comparing it to the libraries listed below
Sorting:
- A collection of vision foundation models unifying understanding and generation.☆56Updated 9 months ago
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆121Updated last week
- [ACMMM 2025 - Dataset Track] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆19Updated 3 months ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆108Updated 3 months ago
- BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models☆34Updated last month
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆137Updated last month
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆127Updated 11 months ago
- ☆50Updated last month
- Fundamentals of Digital Media Technology(04713901) | Peking University ECE Course Materials☆23Updated 3 years ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆26Updated 4 months ago
- ☆52Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆222Updated last month
- A tiny paper rating web☆39Updated 6 months ago
- A paper list for spatial reasoning☆143Updated 4 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆73Updated 3 months ago
- About Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a …☆38Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆131Updated this week
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆71Updated 2 months ago
- [NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning☆52Updated last week
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆36Updated 10 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆156Updated 2 weeks ago
- [Neurips 2025 NextVid Workshop Oral✨] Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minim…☆48Updated 3 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆77Updated 7 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆150Updated 3 weeks ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆109Updated last week
- A vue-based project page template for academic papers. (in development) https://junyaohu.github.io/academic-project-page-template-vue☆292Updated 3 months ago
- Official implementation of MC-LLaVA.☆140Updated last month
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆31Updated 6 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆63Updated 4 months ago
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆63Updated last month