RM-R1-UIUC / RM-R1
☆40Updated this week
Alternatives and similar repositories for RM-R1:
Users that are interested in RM-R1 are comparing it to the libraries listed below
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆41Updated 3 weeks ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆72Updated 6 months ago
- ☆20Updated 2 months ago
- ☆22Updated 10 months ago
- ☆16Updated 9 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 6 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 4 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated 2 weeks ago
- ☆15Updated 3 weeks ago
- ☆24Updated 3 weeks ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆21Updated 2 months ago
- ☆36Updated 3 weeks ago
- This repository introduce a comprehensive paper list, datasets, methods and tools for memory research.☆26Updated this week
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- Exploration of automated dataset selection approaches at large scales.☆39Updated 2 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆33Updated 9 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆49Updated 9 months ago
- ☆43Updated 3 weeks ago
- ☆18Updated this week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated 2 weeks ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- Large Language Models Can Self-Improve in Long-context Reasoning☆69Updated 5 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆51Updated 5 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆29Updated 7 months ago
- ☆11Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated 2 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 3 weeks ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆37Updated 10 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 2 months ago