yichengchen24 / ACP
Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
☆22Updated 2 weeks ago
Alternatives and similar repositories for ACP:
Users that are interested in ACP are comparing it to the libraries listed below
- Liquid: Language Models are Scalable Multi-modal Generators☆60Updated last month
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆65Updated 3 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆61Updated 8 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆61Updated 5 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆95Updated last month
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆31Updated 10 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆122Updated 2 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆34Updated last month
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆27Updated last month
- ☆15Updated last week
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆75Updated this week
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆130Updated last month
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆40Updated 2 weeks ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- ☆20Updated 3 weeks ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆66Updated 3 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆28Updated 2 weeks ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆60Updated 3 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆40Updated 3 weeks ago
- Official Implementation of VideoDPO☆37Updated 2 weeks ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆66Updated 3 months ago
- ☆114Updated 7 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆25Updated 8 months ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆17Updated 4 months ago
- ☆21Updated last year
- ICCV2023-Diffusion-Papers☆109Updated last year
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆36Updated 4 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆26Updated 4 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 6 months ago
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆27Updated 2 months ago