liuyvchi / HMAWLinks
hierarchical multi-agent workflow for prompt optimazation
☆14Updated last year
Alternatives and similar repositories for HMAW
Users that are interested in HMAW are comparing it to the libraries listed below
Sorting:
- ☆73Updated last year
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆72Updated last year
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated last year
- ☆133Updated 2 years ago
- Turning to Video for Transcript Sorting☆49Updated 2 years ago
- Official repo for StableLLAVA☆95Updated 2 years ago
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆157Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆79Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated 2 years ago
- ☆32Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32Updated 2 years ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆96Updated last year
- Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)☆17Updated last year
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆145Updated last year
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Updated 3 years ago
- OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.☆116Updated 7 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated 2 years ago
- ☆80Updated last year
- ☆120Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Updated 11 months ago
- ☆21Updated 3 years ago
- ☆11Updated last year
- ☆110Updated 3 years ago
- SVIT: Scaling up Visual Instruction Tuning☆166Updated last year
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆34Updated last year
- Compress conventional Vision-Language Pre-training data☆53Updated 2 years ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
- A Python toolkit for the OmniLabel benchmark providing code for evaluation and visualization☆23Updated last year
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆42Updated last month