anpwu / ZJU-CS-ClassNotes
☆16Updated 2 years ago
Alternatives and similar repositories for ZJU-CS-ClassNotes:
Users that are interested in ZJU-CS-ClassNotes are comparing it to the libraries listed below
- ☆36Updated 2 weeks ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 7 months ago
- ☆97Updated last month
- A simple and flexible PyTorch implementation of StableDiffusion-3 based on diffusers for DIY and finetuning.☆16Updated last week
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆53Updated 4 months ago
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆30Updated this week
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆226Updated 3 weeks ago
- DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆17Updated last month
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆31Updated last month
- 抢占显卡☆63Updated 3 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆230Updated last year
- ☆25Updated 6 months ago
- Official implementation for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way☆20Updated 3 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆115Updated this week
- A collection of vision foundation models unifying understanding and generation.☆40Updated 2 weeks ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆137Updated last week
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆106Updated 8 months ago
- Visualizing the attention of vision-language models☆97Updated 2 months ago
- A tiny paper rating web☆28Updated this week
- ICLR2024 statistics☆47Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆78Updated 4 months ago
- ☆70Updated 2 months ago
- The code for Fine-grained HBOE | AAAI 2024 (official version and optimized version).☆16Updated 9 months ago
- Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"☆219Updated 3 weeks ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 2 months ago
- XQ-GAN🚀: An Open-source Image Tokenization Framework for Autoregressive Generation☆179Updated last month
- The source code for "UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All"☆36Updated 9 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆73Updated 6 months ago
- Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"☆80Updated 10 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆26Updated 2 months ago