OSU-NLP-Group / SeeActChromeExtensionLinks
☆16Updated 10 months ago
Alternatives and similar repositories for SeeActChromeExtension
Users that are interested in SeeActChromeExtension are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 11 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- ☆67Updated 7 months ago
- ☆30Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆29Updated 10 months ago
- ☆11Updated last year
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Updated 11 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆27Updated 10 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆41Updated 10 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Updated 4 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 10 months ago
- ☆60Updated 4 months ago
- ☆24Updated last year
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆23Updated last year
- Measuring and Controlling Persona Drift in Language Model Dialogs☆19Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated 10 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Verifiers for LLM Reinforcement Learning☆78Updated 6 months ago
- ☆41Updated last year
- Run SWE-bench evaluations remotely☆42Updated 2 months ago
- Enhancement in Multimodal Representation Learning.☆40Updated last year
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Updated 7 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆29Updated 7 months ago
- ☆29Updated last year
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆130Updated last month
- ☆35Updated 5 months ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆50Updated 7 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆31Updated 3 months ago
- ☆40Updated 10 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 6 months ago