THUDM / DataSciBenchLinks
DataSciBench: An LLM Agent Benchmark for Data Science
☆50Updated last week
Alternatives and similar repositories for DataSciBench
Users that are interested in DataSciBench are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆102Updated 5 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆116Updated this week
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆29Updated last year
- ☆89Updated 3 months ago
- ☆75Updated 2 months ago
- Open source code of the paper: "OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain"☆80Updated last year
- ☆20Updated 5 months ago
- ☆46Updated 7 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆53Updated last year
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆50Updated last year
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆114Updated 3 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated last year
- Data and Code for EMNLP 2025 Findings Paper "MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search"☆84Updated 2 months ago
- Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".☆44Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆124Updated 7 months ago
- ☆104Updated last year
- A Comprehensive Library for Memory of LLM-based Agents.☆99Updated 8 months ago
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning☆45Updated 7 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆123Updated 5 months ago
- ☆53Updated 11 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆120Updated 8 months ago
- ☆50Updated 11 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆70Updated 2 months ago
- ☆46Updated last year
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆89Updated last year
- Code for paper Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding☆88Updated last year
- official implementation of paper "Process Reward Model with Q-value Rankings"☆65Updated 11 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆29Updated last year
- ☆31Updated last year