chengyou-jia / AgentStore
☆18Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for AgentStore
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Reformatted Alignment☆112Updated 2 months ago
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆40Updated 9 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆48Updated 5 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆86Updated this week
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆54Updated last week
- Towards Large Multimodal Models as Visual Foundation Agents☆123Updated last week
- ☆116Updated 5 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆39Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆49Updated 9 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- ☆17Updated 4 months ago
- augmented LLM with self reflection☆103Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- ☆72Updated 5 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆33Updated last month
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- ☆22Updated 2 months ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆100Updated 2 weeks ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated 2 weeks ago
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆70Updated 8 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆50Updated 7 months ago
- Official Repo for UGround☆100Updated 2 weeks ago
- ☆54Updated 2 months ago
- ☆42Updated 2 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated 9 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆115Updated this week