SkyworkAI / skywork-o1-prm-inference
☆60Updated 4 months ago
Alternatives and similar repositories for skywork-o1-prm-inference:
Users that are interested in skywork-o1-prm-inference are comparing it to the libraries listed below
- ☆101Updated 3 months ago
- ☆171Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆161Updated 2 weeks ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆121Updated 8 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆67Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated 2 weeks ago
- ☆144Updated 3 months ago
- Reproducing R1 for Code with Reliable Rewards☆140Updated 3 weeks ago
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆107Updated last week
- Reformatted Alignment☆115Updated 6 months ago
- A Comprehensive Survey on Long Context Language Modeling☆113Updated last week
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆147Updated 6 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 5 months ago
- The official repository of the Omni-MATH benchmark.☆78Updated 3 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆247Updated 3 months ago
- ☆98Updated 6 months ago
- ☆101Updated last year
- Repository of LV-Eval Benchmark☆61Updated 7 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆100Updated 3 months ago
- A research repo for experiments about Reinforcement Finetuning☆37Updated 2 weeks ago
- On Memorization of Large Language Models in Logical Reasoning☆60Updated this week
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆171Updated 3 weeks ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆153Updated 9 months ago
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆92Updated this week
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆74Updated 2 months ago
- ☆81Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆216Updated last week
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…☆137Updated 8 months ago