open-compass / GTA
Official repository for paper "GTA: A Benchmark for General Tool Agents"
☆28Updated 2 months ago
Related projects: ⓘ
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆46Updated 5 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 8 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆89Updated 4 months ago
- 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆52Updated 3 weeks ago
- Code implementation of synthetic continued pretraining☆13Updated this week
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆28Updated 7 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- ☆80Updated 9 months ago
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆39Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆55Updated last week
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆65Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly"☆121Updated 3 months ago
- Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆65Updated 6 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆73Updated 2 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated 8 months ago
- ☆18Updated 3 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆73Updated 7 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆55Updated 3 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆29Updated 7 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆62Updated 3 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆54Updated 6 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆24Updated last month
- ☆13Updated last month
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆21Updated 2 months ago
- ☆79Updated 3 months ago
- PASTA: Post-hoc Attention Steering for LLMs☆96Updated last week
- ☆25Updated 3 months ago