tmllab / 2024_NeurIPS_CSGN
☆14Updated 5 months ago
Alternatives and similar repositories for 2024_NeurIPS_CSGN:
Users that are interested in 2024_NeurIPS_CSGN are comparing it to the libraries listed below
- ICLR2024 statistics☆47Updated last year
- ☆19Updated 2 years ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 5 months ago
- A paper list for spatial reasoning☆57Updated 2 weeks ago
- An ML research template with good documentation by Boyuan Chen, an MIT PhD student☆67Updated last month
- A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and A…☆98Updated this week
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆48Updated 3 weeks ago
- A collection of vision foundation models unifying understanding and generation.☆51Updated 3 months ago
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆71Updated this week
- ☆31Updated this week
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)☆28Updated last month
- Recent Advances on MLLM's Reasoning Ability☆25Updated 2 weeks ago
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆44Updated 3 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆106Updated last month
- ☆17Updated last month
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆73Updated last week
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆118Updated last month
- Spatial-R1: The first MLLM trained using GRPO for spatial reasoning in videos☆25Updated last week
- The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".☆58Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆80Updated 2 weeks ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆100Updated 6 months ago
- RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints☆43Updated 2 weeks ago
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆127Updated last month
- ☆12Updated 10 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆173Updated this week
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆103Updated 5 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆33Updated 4 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆79Updated 2 weeks ago
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆35Updated last year
- ☆120Updated last year