THUDM / DataSciBenchLinks
DataSciBench: An LLM Agent Benchmark for Data Science
☆26Updated 6 months ago
Alternatives and similar repositories for DataSciBench
Users that are interested in DataSciBench are comparing it to the libraries listed below
Sorting:
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆27Updated last year
- ☆26Updated 4 months ago
- Code for Benchmarking Language Model Agents for Data-Driven Science☆29Updated 10 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆25Updated last year
- Codebase for Instruction Following without Instruction Tuning☆35Updated 11 months ago
- ☆28Updated 10 months ago
- Process Reward Models That Think☆49Updated last month
- ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆109Updated this week
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆31Updated 2 months ago
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆29Updated 2 weeks ago
- ☆70Updated this week
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆23Updated 3 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆30Updated 3 weeks ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆45Updated last month
- Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https:…☆27Updated 2 weeks ago
- ☆19Updated 6 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆28Updated 8 months ago
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning☆33Updated 2 months ago
- ☆22Updated last year
- WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆35Updated last month
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- ☆16Updated last year
- TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25☆51Updated 2 months ago
- Evaluate the Quality of Critique☆36Updated last year
- ☆20Updated 9 months ago
- ☆46Updated 2 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 8 months ago
- ☆103Updated 8 months ago