anpwu / ZJU-CS-ClassNotes
☆20Updated 2 years ago
Alternatives and similar repositories for ZJU-CS-ClassNotes
Users that are interested in ZJU-CS-ClassNotes are comparing it to the libraries listed below
Sorting:
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆83Updated last month
- A paper list for spatial reasoning☆58Updated last month
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆101Updated 6 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆184Updated this week
- A tiny paper rating web☆36Updated last month
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆118Updated last year
- A collection of vision foundation models unifying understanding and generation.☆55Updated 4 months ago
- The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).☆16Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆16Updated last week
- Video Generation Benchmark☆22Updated 3 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago
- ☆29Updated 5 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆33Updated 5 months ago
- ☆30Updated last week
- ☆117Updated 3 months ago
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆36Updated 3 weeks ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆51Updated last month
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆18Updated 4 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆39Updated 11 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆69Updated 7 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆84Updated 8 months ago
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆78Updated 3 weeks ago
- MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation☆12Updated 2 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆29Updated this week
- ☆24Updated 3 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆44Updated 2 months ago
- Accepted by CVPR 2024☆33Updated 11 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆21Updated 3 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆73Updated last week
- ICLR2024 statistics☆47Updated last year