xbench-ai / xbench-evalsLinks
Evergreen, contamination-free, real-world, domain-specific AI evaluation framework
☆110Updated last month
Alternatives and similar repositories for xbench-evals
Users that are interested in xbench-evals are comparing it to the libraries listed below
Sorting:
- ☆132Updated 7 months ago
- ☆147Updated last month
- ☆162Updated 10 months ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆129Updated 8 months ago
- Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evo…☆94Updated 4 months ago
- ☆147Updated last year
- a-m-team's exploration in large language modeling☆194Updated 6 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆114Updated 6 months ago
- ☆392Updated last month
- ☆96Updated 11 months ago
- ☆77Updated 10 months ago
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆166Updated 8 months ago
- ☆173Updated 7 months ago
- Awesome Deep Research list! For more details, please refer to our survey paper -- A Comprehensive Survey of Deep Research: Systems, Metho…☆378Updated last month
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆668Updated last month
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆432Updated this week
- Scaling Preference Data Curation via Human-AI Synergy☆132Updated 5 months ago
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆141Updated 2 weeks ago
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆174Updated 5 months ago
- ☆79Updated 6 months ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆105Updated 6 months ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆294Updated last month
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆272Updated 9 months ago
- On Memorization of Large Language Models in Logical Reasoning☆72Updated 8 months ago
- ☆64Updated 7 months ago
- ☆319Updated 6 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆154Updated 11 months ago
- ☆54Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆284Updated 2 years ago
- A One-Stop Reward Model Platform☆101Updated this week