THUDM / DataSciBenchLinks
DataSciBench: An LLM Agent Benchmark for Data Science
☆50Updated 2 weeks ago
Alternatives and similar repositories for DataSciBench
Users that are interested in DataSciBench are comparing it to the libraries listed below
Sorting:
- Open source code of the paper: "OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain"☆81Updated last year
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆103Updated 5 months ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆30Updated last year
- official implementation of paper "Process Reward Model with Q-value Rankings"☆65Updated last year
- ☆90Updated 3 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆29Updated last year
- ☆31Updated last year
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆69Updated 8 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆125Updated 7 months ago
- [ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)☆39Updated 5 months ago
- ☆20Updated 5 months ago
- ☆46Updated 7 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆120Updated last week
- Data and Code for EMNLP 2025 Findings Paper "MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search"☆86Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆53Updated 8 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆28Updated last year
- SCoRe: Training Language Models to Self-Correct via Reinforcement Learning☆15Updated last year
- RL Scaling and Test-Time Scaling (ICML'25)☆113Updated last year
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆63Updated last year
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆34Updated 2 months ago
- Evaluate the Quality of Critique☆36Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 6 months ago
- ☆76Updated 3 months ago
- ☆25Updated 9 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆87Updated 10 months ago
- ☆53Updated 11 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆65Updated last year
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆90Updated last year
- ☆68Updated 4 months ago
- [EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-…☆24Updated last year