JieyuZ2 / TaskMeAnything
A task generation and model evaluation system.
☆51Updated last week
Related projects: ⓘ
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆30Updated 2 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆47Updated last month
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆51Updated 3 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆39Updated 3 months ago
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆38Updated 2 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆36Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆36Updated 5 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆100Updated 2 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆73Updated 7 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆148Updated 2 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆59Updated 3 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆56Updated last month
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆52Updated 2 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆87Updated 3 weeks ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆55Updated last week
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 8 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 4 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆89Updated 4 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆92Updated 2 months ago
- ☆31Updated 8 months ago
- ☆116Updated 3 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆30Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆55Updated 3 months ago
- ☆14Updated last week
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆93Updated last month
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆107Updated last week
- Official repository for paper "GTA: A Benchmark for General Tool Agents"☆28Updated 2 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆67Updated this week
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆121Updated 10 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆49Updated 4 months ago