xbench-ai / xbench-evalsLinks
Evergreen, contamination-free, real-world, domain-specific AI evaluation framework
☆78Updated 2 months ago
Alternatives and similar repositories for xbench-evals
Users that are interested in xbench-evals are comparing it to the libraries listed below
Sorting:
- ☆94Updated 3 months ago
- ☆155Updated 7 months ago
- ☆95Updated 8 months ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆122Updated 5 months ago
- ☆90Updated 2 weeks ago
- a-m-team's exploration in large language modeling☆183Updated 2 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆97Updated 2 months ago
- Awesome Deep Research list☆288Updated last month
- ☆145Updated last year
- ☆160Updated 3 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆222Updated last week
- Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evo…☆81Updated last month
- ☆71Updated 2 months ago
- ☆311Updated 2 months ago
- Deep Research Agent CognitiveKernel-Pro from Tencent AI Lab. Paper: https://arxiv.org/pdf/2508.00414☆303Updated last week
- MiroThinker is open-source agentic models trained for deep research and complex tool use scenarios.☆220Updated this week
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆96Updated 2 months ago
- ☆63Updated 3 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆558Updated 4 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆62Updated 3 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆241Updated 6 months ago
- ☆90Updated 3 months ago
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆157Updated 4 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆91Updated 6 months ago
- Miroflow is an agent framework that simplifies the development of complex, multi-agent systems. Build, manage, and scale your AI agents w…☆274Updated last week
- ☆70Updated 6 months ago
- ☆187Updated 2 months ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆236Updated last week
- On Memorization of Large Language Models in Logical Reasoning☆70Updated 4 months ago
- ☆83Updated last year