JeffreyXiang / MSRA-Intern-s-Toolkit
☆17Updated 5 months ago
Alternatives and similar repositories for MSRA-Intern-s-Toolkit:
Users that are interested in MSRA-Intern-s-Toolkit are comparing it to the libraries listed below
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)☆28Updated last month
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆92Updated this week
- Dataset splits and evaluation code for the paper "Benchmark for Compositional Text-to-Image Synthesis" (NeurIPS 2021)☆46Updated 2 years ago
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs (NeurIPs 2024)☆16Updated 6 months ago
- A Video Tokenizer Evaluation Dataset☆113Updated 3 months ago
- GaussianDreamer extension of threestudio.☆49Updated last year
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆89Updated last month
- Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, …☆28Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆69Updated last month
- ICLR2024 statistics☆47Updated last year
- ☆126Updated 3 months ago
- DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance. [CVPR 2024] Official PyTorch implementation☆101Updated 8 months ago
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆71Updated this week
- Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"☆109Updated 2 months ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆99Updated 6 months ago
- A paper list that includes world models or generative video models for embodied agents.☆22Updated 3 months ago
- A collection of vision foundation models unifying understanding and generation.☆51Updated 3 months ago
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆50Updated 9 months ago
- Official code for paper: Text-to-Image Rectified Flow as Plug-and-Play Priors [ICLR 2025]☆118Updated last week
- A paper list for spatial reasoning☆57Updated 2 weeks ago
- Code for TFG: Unified Training-Free Guidance for Diffusion Models☆54Updated 2 weeks ago
- Official implementation of "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"☆77Updated last year
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆66Updated last month
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated 11 months ago
- [ECCV 2024] Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models☆109Updated 4 months ago
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆44Updated 3 months ago
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆240Updated this week
- ☆43Updated last week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆106Updated 3 weeks ago
- [CVPR'24] GraphDreamer: a novel framework of generating compositional 3D scenes from scene graphs.☆179Updated last year