babelcloud / LLM-RGB
LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
☆156Updated this week
Alternatives and similar repositories for LLM-RGB:
Users that are interested in LLM-RGB are comparing it to the libraries listed below
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆133Updated last month
- ☆155Updated 7 months ago
- This is the official repo for "PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization". PromptAgen…☆254Updated 7 months ago
- ☆268Updated last year
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.☆363Updated last year
- ☆51Updated 8 months ago
- FireAct: Toward Language Agent Fine-tuning☆271Updated last year
- ☆160Updated last month
- SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准☆83Updated last year
- Beating the GAIA benchmark with Transformers Agents. 🚀☆103Updated last month
- ☆312Updated 6 months ago
- 🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆316Updated 4 months ago
- Generative Judge for Evaluating Alignment☆230Updated last year
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆227Updated 7 months ago
- ☆142Updated 8 months ago
- ☆29Updated 6 months ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆224Updated 7 months ago
- ☆93Updated 3 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆215Updated 2 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆177Updated 2 weeks ago
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆242Updated last month
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆229Updated last month
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆293Updated 5 months ago
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated last year
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated 11 months ago
- Reformatted Alignment☆115Updated 6 months ago
- ⏳ ChatLog: Recording and Analysing ChatGPT Across Time☆97Updated 9 months ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆338Updated 9 months ago
- This repository contains the paper list for the paper: Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reaso…☆352Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆104Updated 6 months ago