anpwu / ZJU-CS-ClassNotes
☆18Updated 2 years ago
Alternatives and similar repositories for ZJU-CS-ClassNotes:
Users that are interested in ZJU-CS-ClassNotes are comparing it to the libraries listed below
- ☆47Updated this week
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 8 months ago
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆17Updated last month
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆253Updated last month
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆231Updated last year
- 关于LLM和Multimodal LLM的paper list☆26Updated this week
- A tiny paper rating web☆30Updated last week
- Towards Modality Generalization: A Benchmark and Prospective Analysis☆20Updated last week
- [ICLR25] High-performance Image Tokenizers for VAR and AR☆200Updated last week
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆85Updated 3 months ago
- ☆103Updated last week
- Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"☆82Updated 11 months ago
- LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆51Updated this week
- ☆73Updated 3 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆109Updated 9 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆80Updated 7 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆61Updated 5 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆64Updated 8 months ago
- Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"☆237Updated last month
- Idempotent Generative Network's unofficial pytorch implementation☆45Updated last year
- Empowering Unified MLLM with Multi-granular Visual Generation☆117Updated last month
- ☆29Updated 7 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆70Updated 4 months ago
- ☆75Updated 2 months ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆65Updated this week
- Survey on Data-centric Large Language Models☆77Updated 7 months ago
- The paper collections for the autoregressive models in vision.☆406Updated this week
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆24Updated 8 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆73Updated last week