OpenHands / trajectory-visualizerLinks
β44Updated 2 weeks ago
Alternatives and similar repositories for trajectory-visualizer
Users that are interested in trajectory-visualizer are comparing it to the libraries listed below
Sorting:
- Agent computer interface for AI software engineer.β115Updated last month
- ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!β53Updated 6 months ago
- Run SWE-bench evaluations remotelyβ50Updated 5 months ago
- Small, simple agent task environments for training and evaluationβ19Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data generaβ¦β260Updated last week
- β41Updated last year
- accompanying material for sleep-time compute paperβ119Updated 9 months ago
- β61Updated 7 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"β69Updated last year
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!β135Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ββ34Updated 9 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through gitβ¦β14Updated 9 months ago
- β136Updated 10 months ago
- β131Updated 8 months ago
- A system that tries to resolve all issues on a github repo with OpenHands.β117Updated last year
- Source code for the collaborative reasoner research project at Meta FAIR.β112Updated 9 months ago
- π§ Compare how Agent systems perform on several benchmarks. ππβ103Updated 5 months ago
- Coding problems used in aider's polyglot benchmarkβ199Updated last year
- Train your own SOTA deductive reasoning modelβ107Updated 10 months ago
- Benchmarking Goal-Oriented Software Engineeringβ100Updated 3 weeks ago
- β32Updated last year
- Verifiers for LLM Reinforcement Learningβ80Updated 9 months ago
- Data recipes and robust infrastructure for training AI agentsβ84Updated last week
- Landing page + leaderboard for SWE-Bench benchmarkβ10Updated this week
- Complex Function Calling Benchmark.β163Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optunaβ59Updated 3 months ago
- Harbor is a framework for running agent evaluations and creating and using RL environments.β488Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.β189Updated 10 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ61Updated last year
- Reasoning by Communicating with Agentsβ29Updated 9 months ago