metauto-ai / agent-as-a-judge
π€ Agent-as-a-Judge and DevAI dataset
β308Updated 3 weeks ago
Alternatives and similar repositories for agent-as-a-judge:
Users that are interested in agent-as-a-judge are comparing it to the libraries listed below
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and β¦β331Updated 7 months ago
- AWM: Agent Workflow Memoryβ231Updated last month
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".β209Updated 4 months ago
- Search-o1: Agentic Search-Enhanced Large Reasoning Modelsβ335Updated this week
- [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Modelsβ579Updated 2 weeks ago
- β291Updated 9 months ago
- β342Updated this week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhanβ¦β548Updated 7 months ago
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi eβ¦β380Updated last month
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interactionβ175Updated this week
- A compilation of the best multi-agent papersβ341Updated last week
- This is the official repository for Auto-RAG.β179Updated last week
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β522Updated 3 weeks ago
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/β206Updated last month
- An Analytical Evaluation Board of Multi-turn LLM Agentsβ270Updated 7 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLβ282Updated 3 weeks ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generationβ281Updated 2 months ago
- NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRavβ¦β310Updated last year
- UGround: Universal GUI Visual Grounding for GUI Agentsβ138Updated this week
- Environments, tools, and benchmarks for general computer agentsβ188Updated 2 months ago
- β255Updated last month
- β560Updated this week
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality sβ¦β565Updated last week
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ268Updated 6 months ago
- OpenResearcher, an advanced Scientific Research Assistantβ416Updated 3 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuningβ346Updated 4 months ago
- β484Updated last month
- Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffoldingβ359Updated 11 months ago
- FireAct: Toward Language Agent Fine-tuningβ261Updated last year
- β149Updated 5 months ago