hud-evals / hud-sdkLinks
HUD SDK
☆64Updated this week
Alternatives and similar repositories for hud-sdk
Users that are interested in hud-sdk are comparing it to the libraries listed below
Sorting:
- Challenges for general-purpose web-browsing AI agents☆58Updated 3 weeks ago
- AGI SDK☆60Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆104Updated last month
- Scale your LLM-as-a-judge.☆239Updated 2 weeks ago
- ☆63Updated last month
- ☆156Updated 3 months ago
- ☆128Updated 3 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆80Updated 2 weeks ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆150Updated 4 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆68Updated 3 months ago
- Agent computer interface for AI software engineer.☆85Updated this week
- A framework for optimizing DSPy programs with RL☆75Updated this week
- Code for ScribeAgent paper☆58Updated 3 months ago
- Prompt design in Python☆60Updated 6 months ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆46Updated 2 months ago
- Letting Claude Code develop his own MCP tools :)☆109Updated 3 months ago
- ☆90Updated 2 weeks ago
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- Let Claude control a web browser on your machine.☆30Updated 2 weeks ago
- [NAACL2025] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications☆93Updated 3 weeks ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- AgentTrace is a lightweight observability library to trace and evaluate agentic systems.☆30Updated 2 months ago
- AWM: Agent Workflow Memory☆275Updated 4 months ago
- ☆41Updated 4 months ago
- ☆86Updated 5 months ago
- ☆77Updated 7 months ago
- ☆50Updated 3 weeks ago
- Large Language Model (LLM) powered evaluator for Retrieval Augmented Generation (RAG) pipelines.☆28Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆57Updated 6 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆80Updated 3 months ago