benediktstroebl / agent-evals
☆15Updated 5 months ago
Alternatives and similar repositories for agent-evals:
Users that are interested in agent-evals are comparing it to the libraries listed below
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆15Updated last year
- ☆48Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆17Updated 2 months ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 4 months ago
- ☆45Updated 5 months ago
- Tools for merging pretrained large language models.☆19Updated 9 months ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago
- PyTorch implementation for MRL☆18Updated last year
- ☆60Updated last month
- ☆41Updated 3 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆21Updated 2 weeks ago
- LLM reads a paper and produce a working prototype☆51Updated last week
- ☆24Updated 6 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆32Updated 4 months ago
- Uses a Gradio interface to stream coding related responses from local and cloud based large language models. Pulls context from GitHub Re…☆20Updated last week
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆33Updated last year
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last week
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆41Updated 11 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆28Updated last month
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆54Updated 3 weeks ago
- Measuring RAG solutions throughput and latency☆15Updated 7 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year