para-lost / AutoPresent
Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)
☆59Updated last week
Alternatives and similar repositories for AutoPresent:
Users that are interested in AutoPresent are comparing it to the libraries listed below
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆116Updated 8 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆68Updated 3 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆60Updated this week
- OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? [CVPR 2025]☆34Updated 2 weeks ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 9 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆63Updated 6 months ago
- Official implement of MIA-DPO☆52Updated last month
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 4 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- Official repo for StableLLAVA☆94Updated last year
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆76Updated 4 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 5 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆168Updated 5 months ago
- ☆38Updated 2 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆32Updated this week
- [CVPR2025] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆146Updated 2 weeks ago
- Matryoshka Multimodal Models☆97Updated last month
- [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs☆141Updated 7 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆92Updated 4 months ago
- ☆95Updated 9 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆23Updated this week
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆28Updated 4 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆18Updated last month
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 6 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆130Updated 4 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆72Updated last week
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆114Updated 11 months ago