超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1
☆33Apr 5, 2025Updated last year
Alternatives and similar repositories for 24-Game-Reasoning
Users that are interested in 24-Game-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆68Oct 27, 2025Updated 7 months ago
- Long CoT Fine-Tuning and Reinforcement Learning for LLMs in the Context of the 24-Point Game: A Toy Project☆25Feb 22, 2025Updated last year
- Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"☆175Mar 18, 2026Updated 2 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- Metaskill: A Meta-Skill for Autonomous AI Agent Team Generation☆40Feb 23, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆16Feb 15, 2024Updated 2 years ago
- 本项目对Deepseek-R1-Distill-Qwen-7B进行心理咨询CoT数据的LoRA微调,以进一步提升Deepseek-R1-Distill-Qwen-7B在心理咨询领域的慢思考能力。☆12Mar 11, 2025Updated last year
- 本项目从零开始构建并优化了一个千万参数级别的大规模预训练语言模型,涵盖预训练、有监督微调(SFT)和R1推理蒸馏三个阶段。项目采用自定义Transformer架构(包括RMSNorm、分组注意力、多Query机制、SwiGLU激活和RoPE位置编码),实现高效的长文本处理和…☆22Mar 10, 2025Updated last year
- A collection of some awesome public projects about LLM-based Web Agents and Tools.☆13Apr 25, 2024Updated 2 years ago
- The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration" (ICML 2026)☆24Feb 4, 2026Updated 3 months ago
- 小红书爬虫☆10May 26, 2023Updated 3 years ago
- 日常脚本☆15Feb 10, 2026Updated 3 months ago
- Hacking the TGAM1 Neurosky EEG chip with an Arduino.☆12Feb 2, 2018Updated 8 years ago
- guqin jzp ocr and music generation☆13May 20, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code and data for EMNLP 2023 research track paper "MarkQA: A large scale KBQA dataset with numerical reasoning"☆12Jan 2, 2024Updated 2 years ago
- The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"☆22Jun 26, 2025Updated 11 months ago
- ☆22Jan 3, 2026Updated 4 months ago
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆27May 16, 2025Updated last year
- ☆14Dec 25, 2024Updated last year
- ☆10Jul 11, 2022Updated 3 years ago
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆81Aug 20, 2025Updated 9 months ago
- ☆19Jun 14, 2024Updated last year
- 基于浏览器端,通过JavaScript的小红书爬虫☆13Apr 24, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.☆21Apr 3, 2025Updated last year
- A collection of metric learning papers.☆21Apr 24, 2023Updated 3 years ago
- An android app which converts text/voice input to American Sign Language(ASL)☆14Sep 8, 2016Updated 9 years ago
- Solving High Frequency and Multi-Scale PDEs with Gaussian Processes (ICLR 2024)☆25Jun 7, 2024Updated last year
- ☆46May 27, 2025Updated last year
- ☆34Apr 1, 2025Updated last year
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning☆42Apr 13, 2026Updated last month
- Extra documentation (knowledge graph, images, etc..) for the paper "Legal Knowledge Extraction for Knowledge Graph Based Question-Answeri…☆15Jan 16, 2023Updated 3 years ago
- ✨✨ Official repo for "Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning"☆16Nov 8, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆29Oct 6, 2021Updated 4 years ago
- A Code System for Grammar Error Correction Method. Code Repo for ACL 24 Main "Detection-Correction Structure via General Language Model f…☆23Sep 17, 2024Updated last year
- Code of "Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model"☆14Jul 8, 2025Updated 10 months ago
- DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models (NeurIPS 2024 D&B Track)☆24Mar 6, 2025Updated last year
- A project implementing various agentic RL based on the Slime post-training framework☆406Apr 11, 2026Updated last month
- ☆14Nov 29, 2023Updated 2 years ago
- Waffer-thin FlaskGPT on Vercel.☆12Jun 1, 2023Updated 2 years ago