jianke0604 / MTLlibLinks
[CVPR’25] PIVRG & ConsMTL
☆12Updated 2 weeks ago
Alternatives and similar repositories for MTLlib
Users that are interested in MTLlib are comparing it to the libraries listed below
Sorting:
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆112Updated 8 months ago
- A tiny paper rating web☆38Updated 3 months ago
- A python script for downloading huggingface datasets and models.☆19Updated 2 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆246Updated this week
- ☆34Updated 2 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆120Updated 2 weeks ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 2 months ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆175Updated 3 months ago
- Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆62Updated 3 weeks ago
- A paper list for spatial reasoning☆94Updated 2 weeks ago
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆121Updated last week
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆35Updated last month
- Official repository for VisionZip (CVPR 2025)☆305Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generation☆124Updated 5 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆191Updated 2 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆358Updated last week
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆170Updated 3 weeks ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆253Updated last month
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆337Updated 3 months ago
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆51Updated 2 weeks ago
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆37Updated this week
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆119Updated last month
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated 3 weeks ago
- ☆86Updated 3 months ago
- A collection of vision foundation models unifying understanding and generation.☆55Updated 5 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆133Updated last month
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆124Updated last month
- A Collection of Papers on Diffusion Language Models☆81Updated last week
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆144Updated 3 months ago
- R1-like Video-LLM for Temporal Grounding☆101Updated this week