TencentARC / Plot2Code
☆16Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Plot2Code
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆52Updated last month
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆57Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- ☆45Updated last year
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆41Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated last month
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆50Updated last month
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆49Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆57Updated 5 months ago
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 2 weeks ago
- ☆55Updated 3 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆121Updated last week
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆16Updated this week
- ☆36Updated 3 months ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆34Updated 4 months ago
- ☆72Updated 8 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- research work on multimodal cognitive ai☆56Updated this week
- HelloBench: evaluating long text generation capabilities of LLMs☆29Updated 3 weeks ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆17Updated 4 months ago
- ☆24Updated 9 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆120Updated 4 months ago