microsoft/simulated-trial-and-error

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/simulated-trial-and-error)

microsoft / simulated-trial-and-error

☆124

Alternatives and similar repositories for simulated-trial-and-error

Users that are interested in simulated-trial-and-error are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JHU-CLSP / turking-bench
View on GitHub
Web-grounded natural language instructions
☆18Nov 25, 2024Updated last year
HanNight / soft_self_consistency
View on GitHub
Code for ACL 2024 paper "Soft Self-Consistency Improves Language Model Agents"
☆25Sep 11, 2024Updated last year
zjunlp / AutoAct
View on GitHub
[ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
☆238Jan 13, 2025Updated last year
X-PLUG / Multi-LLM-Agent
View on GitHub
☆242Apr 23, 2024Updated 2 years ago
OSU-NLP-Group / Middleware
View on GitHub
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Dec 29, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HowieHwong / MetaTool
View on GitHub
[ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
☆115Mar 21, 2024Updated 2 years ago
InternLM / Agent-FLAN
View on GitHub
[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
☆361Mar 22, 2024Updated 2 years ago
JoeYing1019 / UltraTool
View on GitHub
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆71Aug 5, 2025Updated 11 months ago
magicgh / Self-MAP
View on GitHub
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆16Oct 12, 2024Updated last year
dyabel / AnyTool
View on GitHub
☆318Mar 26, 2024Updated 2 years ago
IBM / API-BLEND
View on GitHub
Companion code to https://arxiv.org/abs/2402.15491
☆22Sep 18, 2025Updated 10 months ago
SparksJoe / Prism
View on GitHub
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Jun 28, 2024Updated 2 years ago
zjunlp / KnowAgent
View on GitHub
[NAACL 2025] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
☆260Jan 29, 2025Updated last year
Yifan-Song793 / ETO
View on GitHub
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆168Oct 30, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
AgentForceTeamOfficial / UA2-Agent
View on GitHub
Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…
☆19Nov 12, 2024Updated last year
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
yao8839836 / cp
View on GitHub
☆13Feb 17, 2025Updated last year
SalesforceAIResearch / xLAM
View on GitHub
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆634Jun 2, 2026Updated last month
microsoft / text-to-sql-schema-expansion-generalization
View on GitHub
Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
☆13Jul 26, 2023Updated 2 years ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆237Apr 15, 2025Updated last year
Sm0kyWu / ClusteringSDF
View on GitHub
☆11Sep 16, 2024Updated last year
OSU-NLP-Group / llm-planning-eval
View on GitHub
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Feb 23, 2024Updated 2 years ago
fairyshine / Seal-Tools
View on GitHub
The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…
☆57Nov 5, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
sheetagent / sheetagent.github.io
View on GitHub
☆14Apr 25, 2025Updated last year
kyegomez / SelfExtend
View on GitHub
Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta
☆13Nov 11, 2024Updated last year
THUNLP-MT / CODIS
View on GitHub
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
☆13Oct 14, 2024Updated last year
anchen1011 / FireAct
View on GitHub
FireAct: Toward Language Agent Fine-tuning
☆296Oct 22, 2023Updated 2 years ago
VITA-Group / o1-planning
View on GitHub
[NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
☆42Apr 10, 2026Updated 3 months ago
musabgultekin / functionary
View on GitHub
Chat language model that can interpret and execute functions/plugins
☆14Oct 16, 2024Updated last year
clinicalml / co-llm
View on GitHub
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆128May 7, 2024Updated 2 years ago
AgentForceTeamOfficial / Baby-AIGS
View on GitHub
Official Implementation of the Baby-AIGS system
☆24Nov 25, 2024Updated last year
interactive-fiction-class / interactive-fiction-class.github.io
View on GitHub
☆16Oct 4, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
LINs-lab / ELICIT
View on GitHub
[ICLR 2025] ELICIT: LLM Augmentation Via External In-context Capability
☆14Mar 11, 2025Updated last year
chtmp223 / suri
View on GitHub
Suri: Multi-constraint instruction following for long-form text generation [EMNLP’24]
☆27Oct 3, 2025Updated 9 months ago
Ber666 / ToolkenGPT
View on GitHub
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)
☆271Apr 18, 2024Updated 2 years ago
THUDM / AgentTuning
View on GitHub
AgentTuning: Enabling Generalized Agent Abilities for LLMs
☆1,500Oct 31, 2023Updated 2 years ago
clab / cnn-v1
View on GitHub
Legacy version of CNN neural net toolkit (now called dynet)
☆19Oct 8, 2016Updated 9 years ago
heimy2000 / CMAT
View on GitHub
☆21Feb 26, 2024Updated 2 years ago
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,586Feb 8, 2026Updated 5 months ago