anpwu / ZJU-CS-ClassNotes
☆19Updated 2 years ago
Alternatives and similar repositories for ZJU-CS-ClassNotes:
Users that are interested in ZJU-CS-ClassNotes are comparing it to the libraries listed below
- The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).☆16Updated 11 months ago
- A paper list for spatial reasoning☆51Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆53Updated this week
- ☆50Updated this week
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆92Updated 5 months ago
- Video-R1: Towards Super Reasoning Ability in Video Understanding MLLMs☆105Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 2 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆292Updated 3 weeks ago
- A tiny paper rating web☆35Updated this week
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆109Updated 10 months ago
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆18Updated 2 months ago
- [NeurIPS 2024] DEMO: Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning☆47Updated 4 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆119Updated 2 weeks ago
- ☆19Updated last month
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆44Updated last week
- ☆107Updated last month
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆83Updated 6 months ago
- A collection of vision foundation models unifying understanding and generation.☆47Updated 2 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆18Updated last month
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆36Updated 9 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆153Updated 2 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆67Updated 6 months ago
- ☆21Updated 4 months ago
- 抢占显卡☆64Updated 5 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)☆24Updated this week
- ☆77Updated 5 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆40Updated last month
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆32Updated this week
- ☆31Updated 8 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆87Updated last week