THUDM / DataSciBenchLinks
DataSciBench: An LLM Agent Benchmark for Data Science
☆33Updated 3 weeks ago
Alternatives and similar repositories for DataSciBench
Users that are interested in DataSciBench are comparing it to the libraries listed below
Sorting:
- ☆19Updated 3 weeks ago
- ☆28Updated 11 months ago
- ☆36Updated last month
- ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆115Updated 3 weeks ago
- ☆26Updated 5 months ago
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆75Updated last month
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning☆33Updated 2 months ago
- Open source code of the paper: "OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain"☆71Updated 9 months ago
- A Comprehensive Library for Memory of LLM-based Agents.☆77Updated 4 months ago
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆62Updated last month
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL☆95Updated last week
- [EMNLP 2025] Verification Engineering for RL in Instruction Following☆38Updated this week
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆28Updated last year
- This is the code of MMOA-RAG.☆74Updated 4 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆61Updated 7 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆29Updated last year
- ☆73Updated 3 weeks ago
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆93Updated last month
- Code for paper Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding☆81Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated 3 months ago
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆25Updated 4 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆64Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆35Updated 11 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆56Updated 3 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆49Updated last year
- ☆17Updated 2 months ago
- ☆53Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆111Updated 4 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆26Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 8 months ago