babelcloud / LLM-RGBLinks
LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
☆161Updated last week
Alternatives and similar repositories for LLM-RGB
Users that are interested in LLM-RGB are comparing it to the libraries listed below
Sorting:
- Beating the GAIA benchmark with Transformers Agents. 🚀☆120Updated 3 months ago
- This is the official repo for "PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization". PromptAgen…☆283Updated 9 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆224Updated 4 months ago
- ☆269Updated 2 years ago
- ☆157Updated 9 months ago
- SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准☆88Updated last year
- ☆51Updated 10 months ago
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆142Updated 2 months ago
- FireAct: Toward Language Agent Fine-tuning☆278Updated last year
- Evaluating tool-augmented LLMs in conversation settings☆84Updated last year
- Generative Judge for Evaluating Alignment☆238Updated last year
- ☆315Updated 8 months ago
- A curated list of autonomous agents and developer tools powered by LLM.☆40Updated last year
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance☆75Updated 6 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆274Updated last year
- The evaluation benchmark on MCP servers☆113Updated last week
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆318Updated last year
- BIBench:数据分析领域LLM评测基准☆18Updated last year
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆231Updated 9 months ago
- ☆142Updated 11 months ago
- ☆282Updated 10 months ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- Open Source WizardCoder Dataset☆158Updated last year
- [NAACL 2025] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents☆223Updated 4 months ago
- GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well a…☆348Updated last year
- ☆29Updated 9 months ago
- ☆320Updated 11 months ago
- ☆172Updated last year
- This repository contains the paper list for the paper: Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reaso…☆359Updated last year