OpenGVLab / Awesome-LLM4Tool
A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
☆65Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Awesome-LLM4Tool
- Touchstone: Evaluating Vision-Language Models by Language Models☆78Updated 10 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆86Updated last month
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- A curated list of resources about long-context in large-language models and video understanding.☆30Updated last year
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆43Updated 3 weeks ago
- ☆45Updated last year
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated last month
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 5 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated last month
- ☆34Updated last week
- ☆65Updated last year
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆39Updated 4 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 7 months ago
- ☆35Updated 2 months ago
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆46Updated 8 months ago
- ☆74Updated 8 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆36Updated 8 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆26Updated 4 months ago
- ☆58Updated 9 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated 7 months ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆46Updated 7 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆38Updated 7 months ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆69Updated last month
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆69Updated last week
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆55Updated last month
- ☆21Updated last month
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆113Updated last month
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆99Updated 8 months ago