THUDM/DataSciBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/THUDM/DataSciBench)

THUDM / DataSciBench

DataSciBench: An LLM Agent Benchmark for Data Science (Findings of ACL 2026)

☆64

Alternatives and similar repositories for DataSciBench

Users that are interested in DataSciBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LiqiangJing / DSBench
View on GitHub
[ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?
☆125Aug 17, 2025Updated 11 months ago
TableBench / TableBench
View on GitHub
Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"
☆92May 8, 2025Updated last year
ServiceNow / AgentAda
View on GitHub
Agent ADA is a comprehensive evaluation and data analytics framework focused on insights generation and skills assessment.
☆15Aug 19, 2025Updated 11 months ago
THUDM / SciGLM
View on GitHub
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)
☆88Feb 25, 2024Updated 2 years ago
Rafa-zy / QLASS
View on GitHub
☆53Aug 24, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
smallporridge / TrustworthyRAG
View on GitHub
☆16May 18, 2026Updated 2 months ago
rapidsai / legate-boost
View on GitHub
GBM implementation on Legate
☆14Jul 10, 2026Updated last week
om-ai-lab / open-agent-leaderboard
View on GitHub
Reproducible Language Agent Research
☆36Jun 25, 2025Updated last year
likenneth / q_probe
View on GitHub
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆40Jun 10, 2024Updated 2 years ago
all-the-noises / eval-arena
View on GitHub
☆34Mar 21, 2026Updated 3 months ago
mitdbg / Kramabench
View on GitHub
A repository for the Kramabench benchmark
☆68Updated this week
YuyaoZhangQAQ / QCompiler
View on GitHub
This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.
☆17Oct 20, 2025Updated 9 months ago
google-research / chain-of-table
View on GitHub
Code for paper Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
☆94Jun 18, 2024Updated 2 years ago
QwenLM / Confident-Decoding
View on GitHub
☆31Jun 30, 2026Updated 3 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
DeepMathLLM / DeepMath
View on GitHub
一个开源数学大模型项目，旨在探索大模型是否具有数学创造能力，以及大模型在前沿数学研究中的潜在能力。
☆21Mar 19, 2026Updated 4 months ago
NJUNLP / Hallu-PI
View on GitHub
The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …
☆11Sep 27, 2024Updated last year
atschalz / tabprep
View on GitHub
TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks
☆19Jun 7, 2026Updated last month
bigai-nlco / TongSearch-QR
View on GitHub
[EMNLP 2025] TongSearch-QR
☆44Dec 4, 2025Updated 7 months ago
gautierdag / plancraft
View on GitHub
Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs
☆30Nov 7, 2025Updated 8 months ago
Open-Social-World / autolibra
View on GitHub
AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback
☆19Apr 23, 2026Updated 2 months ago
statisticsnorway / ssb-ssbtools
View on GitHub
R-package with Algorithms and Tools for Tabular Statistics and Hierarchical Computations
☆12May 22, 2026Updated last month
SonghuaHu-UMD / MultiSTGraph
View on GitHub
A Multi-graph Multi-head Adaptive Temporal Graph Convolutional Network
☆11May 21, 2023Updated 3 years ago
Table-R1 / Table-R1
View on GitHub
[EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"
☆32Jun 3, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
KodCode-AI / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆13Apr 9, 2025Updated last year
ruiqi-zhong / nlparam
View on GitHub
Augmenting Statistical Models with Natural Language Parameters
☆28Sep 17, 2024Updated last year
momo-journey / mbart-chinese
View on GitHub
多语言降噪预训练模型MBart的中文生成任务
☆11May 27, 2021Updated 5 years ago
MetaCopilot / dseval
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
THUDM / WinGNN
View on GitHub
☆10May 18, 2023Updated 3 years ago
ZJU-REAL / HBPO
View on GitHub
☆34Aug 11, 2025Updated 11 months ago
stasl0217 / FuzzQE-code
View on GitHub
☆13Jun 14, 2022Updated 4 years ago
SIMONLQY / RethinkMCTS
View on GitHub
☆34Oct 2, 2024Updated last year
yihedeng9 / STIC
View on GitHub
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆68May 31, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
huggingface / peft-pytorch-conference
View on GitHub
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆15Oct 16, 2023Updated 2 years ago
viniciusferrao / cloysterhpc
View on GitHub
Cloyster HPC is a turnkey HPC cluster solution with an user-friendly installer
☆10Apr 16, 2026Updated 3 months ago
plageon / HierSearch
View on GitHub
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
☆40Oct 9, 2025Updated 9 months ago
zjunlp / WorldMind
View on GitHub
Aligning Agentic World Models via Knowledgeable Experience Learning
☆37May 15, 2026Updated 2 months ago
lichengliu03 / unary-feedback
View on GitHub
☆44Mar 31, 2026Updated 3 months ago
sfeng-m / REAL4MWP
View on GitHub
Code for EMNLP 2021 Paper "Recall and Learn: A Memory-augmented Solver for Math Word Problems".
☆16Oct 20, 2022Updated 3 years ago
DaoD / SPRING
View on GitHub
[AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models
☆26Sep 24, 2025Updated 9 months ago