JeffreyXiang / MSRA-Intern-s-ToolkitLinks
☆17Updated 7 months ago
Alternatives and similar repositories for MSRA-Intern-s-Toolkit
Users that are interested in MSRA-Intern-s-Toolkit are comparing it to the libraries listed below
Sorting:
- ☆161Updated this week
- Code release for paper "Test-Time Training Done Right"☆164Updated this week
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs (NeurIPs 2024)☆16Updated 8 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆119Updated last month
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆113Updated 8 months ago
- Dataset splits and evaluation code for the paper "Benchmark for Compositional Text-to-Image Synthesis" (NeurIPS 2021)☆46Updated 3 years ago
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆44Updated 5 months ago
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆111Updated last month
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆110Updated last week
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆33Updated 4 months ago
- Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".☆69Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆87Updated 8 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆126Updated 5 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆191Updated 2 months ago
- Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, …☆30Updated 4 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆78Updated 9 months ago
- ICLR2024 statistics☆47Updated last year
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆34Updated 6 months ago
- A collection of vision foundation models unifying understanding and generation.☆56Updated 5 months ago
- Official PyTorch implementation for "Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data" (ICLR…☆49Updated 3 weeks ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆22Updated 10 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆35Updated last month
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆71Updated last year
- 📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.☆15Updated last week
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆121Updated last week
- [ECCV 2024] Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models☆110Updated 6 months ago
- Code for D-DiT☆41Updated 2 months ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆29Updated this week
- GaussianDreamer extension of threestudio.☆49Updated last year