divelab/Sys2Bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/divelab/Sys2Bench)

divelab / Sys2Bench

Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.

☆31

Alternatives and similar repositories for Sys2Bench

Users that are interested in Sys2Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shubhamprshr27 / NeglectedTailsVLM
View on GitHub
This repository houses the code for the paper - "The Neglected of VLMs"
☆30Dec 31, 2025Updated 6 months ago
divelab / E2H-Reasoning
View on GitHub
[ICLR' 26] Implementation of "Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning"
☆24May 28, 2026Updated last month
divelab / ATTA
View on GitHub
Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [ICLR 2024]
☆27Nov 4, 2024Updated last year
THU-KEG / PairJudgeRM
View on GitHub
☆15Apr 14, 2025Updated last year
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
linkedin / ControlLLM
View on GitHub
Control LLM
☆23Apr 6, 2025Updated last year
Optimization-AI / DisCO
View on GitHub
NeurIPS 2025: Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆53Mar 14, 2026Updated 4 months ago
weixiaolong94-hub / Beyond-React
View on GitHub
github for Beyond ReAct: A Planner-Centric Framework for Complex \\ Tool-Augmented LLM Reasoning
☆19Feb 27, 2026Updated 4 months ago
GeniusHTX / TALE
View on GitHub
☆151Sep 12, 2025Updated 10 months ago
baskargroup / flowbench-tools
View on GitHub
☆18Aug 17, 2024Updated last year
yih301 / LLMFP
View on GitHub
☆32Apr 2, 2025Updated last year
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
dsam99 / QueRE
View on GitHub
Code repository for the paper on "Predicting the Performance of Black-Box LLMs through Self-Queries".
☆12Jan 9, 2025Updated last year
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
byronBBL / Context-DPO
View on GitHub
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆23Feb 17, 2025Updated last year
MangoKiller / SimOAR_OAR
View on GitHub
☆11Nov 8, 2023Updated 2 years ago
hiaoxui / nugget
View on GitHub
☆11Aug 1, 2024Updated last year
HauffQian / DGAP
View on GitHub
☆14May 13, 2025Updated last year
divelab / LGLP
View on GitHub
☆21Mar 29, 2021Updated 5 years ago
naitri / Multi-Agent-based-Search-and-Rescue-system-in-ROS
View on GitHub
Multi-Agent Search and Rescue Robot in ROS
☆13Jul 31, 2022Updated 3 years ago
YujieLu10 / CLAP
View on GitHub
☆14Apr 21, 2023Updated 3 years ago
tmlr-group / NoisyRationales
View on GitHub
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆40Jul 18, 2025Updated last year
Yu-chen-Deng / LAPIG
View on GitHub
[TVCG & VR'25] LAPIG: Language Guided Projector Image Generation with Surface Adaptation and Stylization
☆11Apr 16, 2026Updated 3 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
sheep333c / DIVE
View on GitHub
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
☆26Mar 13, 2026Updated 4 months ago
sfeucht / footprints
View on GitHub
https://footprints.baulab.info
☆17Oct 4, 2024Updated last year
zeyofu / ReFocus_Code
View on GitHub
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆50Jul 22, 2025Updated 11 months ago
UKPLab / codeclarqa
View on GitHub
Asking Clarification Questions for Code Generation in General-Purpose Programming Language
☆11May 26, 2023Updated 3 years ago
StarDewXXX / O1-Pruner
View on GitHub
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆99Feb 21, 2025Updated last year
fsxfreak / nlp-augment
View on GitHub
A collection of utilities used in exploring data augmentation of low-resource parallel corpuses. …
☆11Sep 6, 2017Updated 8 years ago
AslanDing / Robust-Fidelity
View on GitHub
a robust metric (robust fidelity) for XGNN (ICLR24)
☆12Jun 3, 2025Updated last year
deepakacharyab / gnn_feature_selection_extraction
View on GitHub
☆15Oct 23, 2019Updated 6 years ago
zihao-ai / unthinking_vulnerability
View on GitHub
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆33May 21, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
mlwu22 / RED
View on GitHub
Implementation code for ACL2024：Advancing Parameter Efficiency in Fine-tuning via Representation Editing
☆15Apr 20, 2024Updated 2 years ago
allenai / super-benchmark
View on GitHub
☆53Apr 4, 2025Updated last year
ShunqiM / PM
View on GitHub
☆14Apr 9, 2026Updated 3 months ago
OSU-NLP-Group / Explorer
View on GitHub
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
☆29Feb 17, 2026Updated 5 months ago
mathiasj33 / deep-ltl
View on GitHub
The official implementation of DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL (ICLR'25 Oral)
☆17Mar 30, 2025Updated last year
RUCAIBox / HaluAgent
View on GitHub
☆23Jul 1, 2024Updated 2 years ago
ynchuang / awesome-efficient-xai
View on GitHub
☆16Feb 7, 2023Updated 3 years ago