WisdomShell / RewardAnythingLinks
RewardAnything: Generalizable Principle-Following Reward Models
☆45Updated 6 months ago
Alternatives and similar repositories for RewardAnything
Users that are interested in RewardAnything are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Updated 9 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆154Updated 5 months ago
- ☆173Updated 2 weeks ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]☆36Updated 3 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆60Updated 6 months ago
- ☆54Updated 5 months ago
- ☆52Updated last year
- ☆38Updated 4 months ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆124Updated 8 months ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆24Updated 2 months ago
- [ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO☆60Updated 7 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆69Updated 5 months ago
- ☆36Updated 5 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆64Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆85Updated last year
- ☆39Updated 5 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated 6 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆118Updated 7 months ago
- instruction-following benchmark for large reasoning models☆45Updated 4 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆131Updated last month
- ☆69Updated 6 months ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆83Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆154Updated this week
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆48Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Updated last year
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆72Updated 5 months ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆73Updated last year
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆80Updated 2 months ago