anpwu / ZJU-CS-ClassNotesLinks
☆21Updated 3 years ago
Alternatives and similar repositories for ZJU-CS-ClassNotes
Users that are interested in ZJU-CS-ClassNotes are comparing it to the libraries listed below
Sorting:
- A paper list for spatial reasoning☆94Updated 2 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆120Updated 2 weeks ago
- A collection of vision foundation models unifying understanding and generation.☆55Updated 5 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆22Updated 4 months ago
- The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).☆16Updated last year
- A tiny paper rating web☆38Updated 3 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆51Updated last week
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆20Updated 3 weeks ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆34Updated 6 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆45Updated 4 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆85Updated 3 weeks ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆112Updated 8 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆246Updated this week
- ☆30Updated 6 months ago
- ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆14Updated this week
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆51Updated 2 weeks ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆119Updated last month
- Official code space for "SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development"☆29Updated 3 weeks ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆63Updated 2 weeks ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆125Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago
- [CVPR 2025] T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆86Updated 3 weeks ago
- Frequency Autoregressive Image Generation with Continuous Tokens☆79Updated 2 weeks ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆124Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generation☆124Updated 5 months ago
- Accepted by CVPR 2024☆34Updated last year
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆65Updated 2 months ago
- ☆122Updated 4 months ago
- A Collection of Papers on Diffusion Language Models☆81Updated last week
- A preview-version of one novel multimodal reasoning benchmark CharmBench.☆22Updated 3 weeks ago