jianke0604 / MTLlibLinks
[CVPR’25] PIVRG & ConsMTL
☆12Updated last month
Alternatives and similar repositories for MTLlib
Users that are interested in MTLlib are comparing it to the libraries listed below
Sorting:
- A tiny paper rating web☆38Updated 3 months ago
- [TMLR 2025🔥] A survey for the autoregressive models in vision.☆653Updated this week
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆256Updated 3 weeks ago
- This is a repo to track the latest autoregressive visual generation papers.☆369Updated 3 weeks ago
- A python script for downloading huggingface datasets and models.☆19Updated 3 months ago
- Official repository for VisionZip (CVPR 2025)☆319Updated last month
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆622Updated this week
- Experiment task scheduling made easy.☆27Updated last week
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆114Updated 8 months ago
- Official implementation of UnifiedReward & UnifiedReward-Think☆461Updated this week
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆268Updated 2 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆355Updated last week
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆698Updated last week
- Long-RL: Scaling RL to Long Sequences☆323Updated this week
- ☆57Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆129Updated last month
- 📚 Collection of awesome generation acceleration resources.☆286Updated last week
- A paper list for spatial reasoning☆121Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆620Updated last month
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆130Updated last month
- ☆457Updated last week
- Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆69Updated last month
- A paper list of some recent works about Token Compress for Vit and VLM☆547Updated last week
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆190Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆193Updated 2 months ago
- Awesome Unified Multimodal Models☆464Updated 2 weeks ago
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems☆327Updated last week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆362Updated 2 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 2 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆447Updated 6 months ago