sylvain-wei/24-Game-Reasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sylvain-wei/24-Game-Reasoning)

sylvain-wei / 24-Game-Reasoning

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1

☆35

Alternatives and similar repositories for 24-Game-Reasoning

Users that are interested in 24-Game-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sylvain-wei / TIME
View on GitHub
[NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario
☆33Oct 5, 2025Updated 9 months ago
MoonshotAI / zsh-kimi-cli
View on GitHub
☆68Oct 27, 2025Updated 8 months ago
He-Ren / OJBench
View on GitHub
☆32Feb 28, 2026Updated 4 months ago
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆28Dec 23, 2024Updated last year
Yifan-Song793 / GoodBadGreedy
View on GitHub
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
☆31Jul 17, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
UMass-Embodied-AGI / CommVQ
View on GitHub
[ICML 2025] CommVQ: Commutative Vector Quantization for KV Cache Compression
☆27Sep 2, 2025Updated 10 months ago
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆94Dec 22, 2024Updated last year
F2-Song / ICDPO
View on GitHub
The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…
☆16Feb 15, 2024Updated 2 years ago
JiehuiXie / PsychCoT-Tuning
View on GitHub
本项目对Deepseek-R1-Distill-Qwen-7B进行心理咨询CoT数据的LoRA微调，以进一步提升Deepseek-R1-Distill-Qwen-7B在心理咨询领域的慢思考能力。
☆12Mar 11, 2025Updated last year
FranxYao / Retrieval-Head-with-Flash-Attention
View on GitHub
Efficient retrieval head analysis with triton flash attention that supports topK probability
☆13Jun 15, 2024Updated 2 years ago
likenneth / dialogue_action_token
View on GitHub
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
☆31Jun 27, 2024Updated 2 years ago
xvirobotics / metaskill
View on GitHub
Metaskill: A Meta-Skill for Autonomous AI Agent Team Generation
☆58Feb 23, 2026Updated 4 months ago
srsohn / TOD-Flow
View on GitHub
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues
☆13Feb 7, 2024Updated 2 years ago
chenzixuan99 / Awesome-LLM-based-Web-Agent-and-Tools
View on GitHub
A collection of some awesome public projects about LLM-based Web Agents and Tools.
☆13Apr 25, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
Lux0926 / ASPRM
View on GitHub
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
☆10Mar 2, 2025Updated last year
xiaoshame / script
View on GitHub
日常脚本
☆15Jun 2, 2026Updated last month
weiyifan1023 / MenatQA
View on GitHub
Code and Data for EMNLP 2023 Paper "MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Langu…
☆14Apr 7, 2025Updated last year
wizardlancet / diagnosis_zero
View on GitHub
diagnosis_zero, R1 Zero reproduce on disease diagnosis
☆33Jul 24, 2025Updated 11 months ago
Fu-Dayuan / AgentRefine
View on GitHub
(ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning
☆20Nov 22, 2025Updated 7 months ago
ljang0 / videowebarena
View on GitHub
☆14Dec 25, 2024Updated last year
yoqim / PR-HFR
View on GitHub
☆13Nov 23, 2022Updated 3 years ago
RUCKBReasoning / CodeRM
View on GitHub
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆27May 16, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
AILWQ / Joint_Supervised_Learning_for_SR
View on GitHub
[ICLR 2023] This repository contains the official Pytorch implementation for the paper "Transformer-based model for symbolic regression v…
☆29Jul 2, 2025Updated last year
weitongseu / PCL
View on GitHub
☆10Jul 11, 2022Updated 3 years ago
ChenglinYu / BHN
View on GitHub
☆10May 28, 2023Updated 3 years ago
WeiminXiong / MPO
View on GitHub
MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)
☆81Aug 20, 2025Updated 10 months ago
AdelWang / MIGRES
View on GitHub
☆19Jun 14, 2024Updated 2 years ago
SuperIron / xhs
View on GitHub
基于浏览器端，通过JavaScript的小红书爬虫
☆13Apr 24, 2023Updated 3 years ago
kkk-an / UltraIF
View on GitHub
Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.
☆21Apr 3, 2025Updated last year
ikheu / point_reactor
View on GitHub
基于一阶泰勒展开的点堆中子动力学方程求解程序
☆16Mar 18, 2019Updated 7 years ago
jyfang6 / REANO
View on GitHub
[ACL 2024] REANO: Optimising Retrieval-Augmented Reader Models through Knowledge Graph Generation
☆12Sep 4, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
NVlabs / NFT
View on GitHub
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…
☆86Sep 8, 2025Updated 10 months ago
NJUNLP / PATS
View on GitHub
☆46May 27, 2025Updated last year
LCO-Embedding / LCO-Embedding
View on GitHub
[NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning
☆46Apr 13, 2026Updated 2 months ago
idaholab / EMRALD
View on GitHub
Event Modeling Risk Assessment using Linked Diagrams (EMRALD) is a software tool developed at INL for researching the capabilities of dyn…
☆29Updated this week
HKUST-KnowComp / AbductiveKGR
View on GitHub
[ACL 2024] Implementation for Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation
☆15Oct 9, 2025Updated 9 months ago
swtheing / LLM-Performance-Improvement-Paper
View on GitHub
☆17Jul 10, 2023Updated 2 years ago
Ding-ZJ / GLoDe
View on GitHub
Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition (IJCAI 2024)
☆11Aug 18, 2024Updated last year