divelab / Sys2BenchLinks
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.
☆24Updated 4 months ago
Alternatives and similar repositories for Sys2Bench
Users that are interested in Sys2Bench are comparing it to the libraries listed below
Sorting:
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆85Updated 6 months ago
- The code implementation of Symbolic-MoE☆35Updated 4 months ago
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆21Updated last year
- ☆20Updated 3 months ago
- [NAACL 25 main] Awesome LLM Causal Reasoning is a collection of LLM-based casual reasoning works, including papers, codes and datasets.☆71Updated 5 months ago
- A collection of resources and papers on AI Scientist / Robot Scientist☆81Updated last month
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆30Updated 3 weeks ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 5 months ago
- Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆30Updated last month
- The rule-based evaluation subset and code implementation of Omni-MATH☆22Updated 6 months ago
- A Sober Look at Language Model Reasoning☆77Updated last month
- ☆19Updated 3 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- ☆46Updated 7 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated 10 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆83Updated 11 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆28Updated 4 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆58Updated last year
- Official Repository of LatentSeek☆54Updated last month
- ☆41Updated 8 months ago
- exploring whether LLMs perform case-based or rule-based reasoning☆29Updated last year
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆78Updated 3 months ago
- ☆18Updated 4 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆103Updated 2 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆49Updated 8 months ago
- Official implementation of the paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"☆192Updated last week
- ☆71Updated 8 months ago
- ☆180Updated last month
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆27Updated this week
- ☆26Updated 3 months ago