anpwu / ZJU-CS-ClassNotesLinks
☆20Updated 2 years ago
Alternatives and similar repositories for ZJU-CS-ClassNotes
Users that are interested in ZJU-CS-ClassNotes are comparing it to the libraries listed below
Sorting:
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆101Updated last week
- Empowering Unified MLLM with Multi-granular Visual Generation☆122Updated 4 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆107Updated 7 months ago
- The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).☆16Updated last year
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆218Updated this week
- ICLR2024 statistics☆47Updated last year
- A collection of vision foundation models unifying understanding and generation.☆54Updated 5 months ago
- ☆119Updated 3 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆44Updated 3 months ago
- [ICML 2025] DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization☆11Updated last week
- [CVPR 2025] T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆83Updated this week
- MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation☆12Updated 3 months ago
- Official PyTorch implementation for "Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data" (ICLR…☆46Updated this week
- A Collection of Papers on Diffusion Language Models☆60Updated this week
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆55Updated 9 months ago
- A tiny paper rating web☆37Updated 2 months ago
- ☆24Updated 3 months ago
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs (NeurIPs 2024)☆16Updated 7 months ago
- ☆21Updated 7 months ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆47Updated 2 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆125Updated last year
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆106Updated 3 weeks ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆151Updated 2 months ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).☆38Updated last year
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆237Updated last year
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆53Updated last week
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆117Updated 2 weeks ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆75Updated last month
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆40Updated 11 months ago
- A paper list for spatial reasoning☆73Updated last week