UW-Madison-Lee-Lab / VersaPRM
โ20Updated 3 months ago
Alternatives and similar repositories for VersaPRM
Users that are interested in VersaPRM are comparing it to the libraries listed below
Sorting:
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"โ74Updated 11 months ago
- [๐๐๐๐๐ ๐ ๐ข๐ง๐๐ข๐ง๐ ๐ฌ ๐๐๐๐ & ๐๐๐ ๐๐๐๐ ๐๐๐๐๐ ๐๐ซ๐๐ฅ] ๐๐ฏ๐ฉ๐ข๐ฏ๐ค๐ช๐ฏ๐จ ๐๐ข๐ต๐ฉ๐ฆ๐ฎ๐ข๐ต๐ช๐ค๐ข๐ญ ๐๐ฆ๐ข๐ด๐ฐ๐ฏ๐ช๐ฏโฆโ50Updated last year
- โ59Updated 8 months ago
- โ63Updated last week
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"โ36Updated 9 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or reโฆโ29Updated 7 months ago
- Sotopia-ฯ: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)โ64Updated last year
- โ17Updated 4 months ago
- Directional Preference Alignmentโ57Updated 7 months ago
- โ46Updated 3 weeks ago
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokensโ24Updated last year
- โ14Updated 5 months ago
- โ22Updated 10 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasonersโ81Updated last month
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsโ52Updated 2 months ago
- โ92Updated 7 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".โ77Updated 4 months ago
- Long Context Extension and Generalization in LLMsโ55Updated 7 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyโ61Updated 5 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewardsโ44Updated last month
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"โ58Updated last year
- The official implementation of Self-Exploring Language Models (SELM)โ64Updated 11 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningโ58Updated 4 months ago
- Discriminator-Guided Chain-of-Thought Reasoningโ47Updated 7 months ago
- Self-Supervised Alignment with Mutual Informationโ18Updated 11 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"โ72Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correctionโ69Updated last month
- โ45Updated 6 months ago
- โ19Updated last year
- Unofficial Implementation of Chain-of-Thought Reasoning Without Promptingโ32Updated last year