chunhuizhang / prompts_for_academicLinks
☆98Updated last week
Alternatives and similar repositories for prompts_for_academic
Users that are interested in prompts_for_academic are comparing it to the libraries listed below
Sorting:
- Scaling Preference Data Curation via Human-AI Synergy☆132Updated 5 months ago
- ☆185Updated last week
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆182Updated 3 months ago
- A research repo for experiments about Reinforcement Finetuning☆53Updated 8 months ago
- llm & rl☆258Updated last month
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆93Updated last month
- ☆52Updated 9 months ago
- Generative AI Act II: Test Time Scaling Drives Cognition Engineering☆209Updated 7 months ago
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents☆238Updated 2 weeks ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆92Updated last month
- ☆45Updated 4 months ago
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆141Updated 2 weeks ago
- ☆86Updated 3 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆33Updated 8 months ago
- ☆69Updated 5 months ago
- PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining p…☆34Updated 3 months ago
- ☆132Updated 7 months ago
- A Comprehensive Survey on Long Context Language Modeling☆213Updated 3 weeks ago
- Extrapolating RLVR to General Domains without Verifiers☆181Updated 4 months ago
- Awesome List for Agentic RL☆585Updated this week
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆294Updated last month
- ☆155Updated last month
- 在verl上做reward的定制开发☆135Updated 6 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models (NeurIPS 2025)☆168Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆154Updated 11 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆56Updated last year
- ☆171Updated last week
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆35Updated 8 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆221Updated 4 months ago
- ☆114Updated 6 months ago