JeffreyXiang / MSRA-Intern-s-ToolkitLinks
☆17Updated 7 months ago
Alternatives and similar repositories for MSRA-Intern-s-Toolkit
Users that are interested in MSRA-Intern-s-Toolkit are comparing it to the libraries listed below
Sorting:
- ☆111Updated last week
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆106Updated last month
- Code release for paper "Test-Time Training Done Right"☆103Updated this week
- Dataset splits and evaluation code for the paper "Benchmark for Compositional Text-to-Image Synthesis" (NeurIPS 2021)☆46Updated 3 years ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆107Updated 7 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆178Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆105Updated this week
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆106Updated last month
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆34Updated 3 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆124Updated 4 months ago
- A list of works on video generation towards world model☆113Updated this week
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆112Updated last week
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆33Updated 3 months ago
- ☆129Updated 5 months ago
- A collection of vision foundation models unifying understanding and generation.☆55Updated 5 months ago
- [ECCV 2024] Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models☆110Updated 6 months ago
- ☆26Updated last week
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated 7 months ago
- Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, …☆30Updated 3 months ago
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆44Updated 4 months ago
- A Video Tokenizer Evaluation Dataset☆120Updated 4 months ago
- Official PyTorch implementation for "Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data" (ICLR…☆46Updated last week
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs (NeurIPs 2024)☆16Updated 7 months ago
- Code for D-DiT☆33Updated 2 months ago
- [ArXiv 2025] WorldMem: Long-term Consistent World Simulation with Memory☆161Updated last week
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆22Updated 9 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆55Updated last month
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆122Updated 3 months ago
- Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".☆63Updated 2 weeks ago
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆110Updated this week