uiuc-kang-lab / agentic-benchmarksLinks
☆33Updated last week
Alternatives and similar repositories for agentic-benchmarks
Users that are interested in agentic-benchmarks are comparing it to the libraries listed below
Sorting:
- ☆64Updated last month
- Verifiers for LLM Reinforcement Learning☆69Updated 3 months ago
- ☆29Updated this week
- CodeUltraFeedback: aligning large language models to coding preferences☆71Updated last year
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆53Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆59Updated 8 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆24Updated 4 months ago
- List of papers on Self-Correction of LLMs.☆74Updated 7 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆92Updated 2 months ago
- LLM reads a paper and produce a working prototype☆58Updated 3 months ago
- ☆45Updated 4 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 6 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 6 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆35Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆47Updated 4 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"☆14Updated 2 months ago
- Exploring limitations of LLM-as-a-judge☆19Updated 11 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- ☆53Updated 9 months ago
- ☆73Updated 3 weeks ago
- The repository contains generative AI analytics platform application code.☆26Updated 3 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 6 months ago
- ☆65Updated last year
- ☆28Updated 4 months ago