EQ-bench / eqbench3Links
☆45Updated 5 months ago
Alternatives and similar repositories for eqbench3
Users that are interested in eqbench3 are comparing it to the libraries listed below
Sorting:
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago
- ☆131Updated 8 months ago
- accompanying material for sleep-time compute paper☆119Updated 8 months ago
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- ☆34Updated last year
- [ACL 2025] Agentic Knowledgeable Self-awareness☆91Updated 7 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆96Updated 8 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Updated 4 months ago
- LLM reads a paper and produce a working prototype☆60Updated 9 months ago
- Multi-Granularity LLM Debugger [ICSE2026]☆95Updated 6 months ago
- ☆39Updated last year
- ☆45Updated last year
- ☆39Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- ☆96Updated last year
- LLMs as Collaboratively Edited Knowledge Bases☆46Updated last year
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆30Updated 9 months ago
- Data Synthesis for Deep Research Based on Semi-Structured Data☆196Updated last month
- ☆92Updated 8 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- ☆100Updated 5 months ago
- ☆63Updated last year
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL☆236Updated 3 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆124Updated 7 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆68Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 9 months ago
- Evaluating tool-augmented LLMs in conversation settings☆88Updated last year
- Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over…☆226Updated this week
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆45Updated last year
- ☆55Updated last year