YujieLu10 / TIP
Multimodal-Procedural-Planning
☆92Updated last year
Alternatives and similar repositories for TIP:
Users that are interested in TIP are comparing it to the libraries listed below
- Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".☆52Updated last year
- ☆85Updated last year
- ☆48Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆56Updated 6 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆128Updated last year
- Official Code of IdealGPT☆36Updated last year
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆19Updated last year
- Official repo for StableLLAVA☆95Updated last year
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated 2 years ago
- ☆66Updated last year
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆67Updated last year
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 10 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆35Updated last month
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- ☆30Updated last year
- Language Repository for Long Video Understanding☆31Updated 10 months ago
- Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action☆36Updated 2 years ago
- A curated list of resources about long-context in large-language models and video understanding.☆31Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated this week
- ☆16Updated 6 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆70Updated 5 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆42Updated 5 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆29Updated last week
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆42Updated last week
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 2 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆50Updated last year
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Updated 2 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated 10 months ago