wl-zhao / THU-CoursesLinks
☆17Updated 3 years ago
Alternatives and similar repositories for THU-Courses
Users that are interested in THU-Courses are comparing it to the libraries listed below
Sorting:
- Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation☆144Updated 7 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆143Updated last year
- ElasticTok: Adaptive Tokenization for Image and Video☆87Updated last year
- A survey for visual generation alignment☆107Updated 2 months ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆285Updated 2 months ago
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆93Updated last year
- ☆11Updated 5 months ago
- ☆38Updated 2 months ago
- [CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024☆132Updated 7 months ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆16Updated 7 months ago
- CaptionQA: Is Your Caption as Useful as the Image Itself?☆28Updated last month
- PyTorch implementation of "HERO: Human Reaction Generation from Videos (ICCV 2025)"☆24Updated 4 months ago
- Official Implementation of VideoDPO☆155Updated 7 months ago
- [TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.☆13Updated 2 years ago
- [CVPR 2024] Narrative Action Evaluation with Prompt-Guided Multimodal Interaction☆40Updated last year
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆39Updated 10 months ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆137Updated 4 months ago
- ☆70Updated 5 months ago
- ☆31Updated last year
- SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enh…☆34Updated last year
- ☆22Updated last year
- [CVPR 2022] Official repository of AdaFocusV2.☆91Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆30Updated last year
- A framework that allows you to apply Sparse AutoEncoder on any models☆49Updated 5 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Updated 11 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆233Updated 7 months ago
- [CVPR 2025] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation☆60Updated 6 months ago
- Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization☆58Updated 3 months ago
- Comparison between Frechet Video Distance implementation from StyleGAN-V and the original paper☆125Updated last year
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆34Updated last year