SalesforceAIResearch / CodeTree
Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
☆17Updated last week
Alternatives and similar repositories for CodeTree:
Users that are interested in CodeTree are comparing it to the libraries listed below
- ☆62Updated 2 weeks ago
- ☆26Updated 2 months ago
- ☆16Updated 6 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆28Updated 2 months ago
- ☆24Updated 6 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆33Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 5 months ago
- Aioli: A unified optimization framework for language model data mixing☆23Updated 2 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 weeks ago
- ☆41Updated 3 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆17Updated this week
- ☆76Updated this week
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆15Updated this week
- ☆15Updated last week
- ☆27Updated 2 weeks ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆42Updated last year
- ☆48Updated 5 months ago
- ☆14Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated 4 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆50Updated last month
- Repository for Skill Set Optimization☆12Updated 8 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆53Updated 4 months ago
- ☆39Updated 8 months ago
- Implementation of "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models"☆27Updated 2 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆86Updated this week
- ☆41Updated last week
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆23Updated last year
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆16Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year