chunhuizhang / prompts_for_academicLinks
☆55Updated this week
Alternatives and similar repositories for prompts_for_academic
Users that are interested in prompts_for_academic are comparing it to the libraries listed below
Sorting:
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆91Updated 7 months ago
- ☆83Updated 2 months ago
- ☆54Updated last year
- Scaling Preference Data Curation via Human-AI Synergy☆122Updated 4 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆30Updated 6 months ago
- ☆104Updated 10 months ago
- ☆50Updated 7 months ago
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models☆119Updated 2 weeks ago
- ☆122Updated 5 months ago
- ☆111Updated 5 months ago
- A research repo for experiments about Reinforcement Finetuning☆52Updated 6 months ago
- ☆154Updated 3 weeks ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆152Updated 10 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆220Updated 3 months ago
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆169Updated 2 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆89Updated 5 months ago
- a toolkit on knowledge distillation for large language models☆191Updated 2 weeks ago
- ☆169Updated 6 months ago
- ☆33Updated 7 months ago
- llm & rl☆236Updated last week
- ☆49Updated last year
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆164Updated 3 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆73Updated 2 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆78Updated last month
- Extrapolating RLVR to General Domains without Verifiers☆176Updated 2 months ago
- ☆161Updated 9 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆85Updated 4 months ago
- ☆125Updated last year
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆98Updated 3 weeks ago