All-Hands-AI / trajectory-visualizerLinks

☆38

Alternatives and similar repositories for trajectory-visualizer

Users that are interested in trajectory-visualizer are comparing it to the libraries listed below

Sorting:

zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆143Updated 9 months ago
All-Hands-AI / openhands-aci
Agent computer interface for AI software engineer.
☆110Updated last month
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆103Updated 6 months ago
JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆18Updated 11 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆97Updated last week
dxhou / CoAct
☆30Updated last year
yueqis / API-Based-Agent
☆58Updated 4 months ago
agential-ai / agential
🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!
☆54Updated 3 months ago
InternLM / SWE-Fixer
☆121Updated 5 months ago
aorwall / moatless-tree-search
☆120Updated 4 months ago
SWE-bench / sb-cli
Run SWE-bench evaluations remotely
☆41Updated 2 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆117Updated 5 months ago
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆63Updated 10 months ago
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆102Updated 2 months ago
aorwall / moatless-testbeds
Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…
☆14Updated 6 months ago
Aider-AI / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆76Updated last year
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆103Updated this week
OpenHands / agent-sdk
A clean, modular SDK for building AI agents with OpenHands V1.
☆79Updated this week
scicode-bench / SciCode
A benchmark that challenges language models to code solutions for scientific problems
☆145Updated last week
xlang-ai / computer-agent-arena
Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!
☆50Updated 6 months ago
TheDuckAI / arb
Advanced Reasoning Benchmark Dataset for LLMs
☆46Updated last year
allenai / IFBench
☆83Updated last week
BigComputer-Project / SWE-Arena
SWE Arena
☆35Updated 3 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆188Updated 7 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆77Updated 6 months ago
asappresearch / webagents-step
☆41Updated last year
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
SWE-agent / SWE-ReX
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆349Updated this week
bradhilton / o1-chain-of-thought
o1 Chain of Thought Examples
☆33Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆110Updated 6 months ago