[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
☆76Jul 18, 2025Updated 7 months ago
Alternatives and similar repositories for RM-Bench
Users that are interested in RM-Bench are comparing it to the libraries listed below
Sorting:
- ☆47Mar 25, 2025Updated 11 months ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- Codes for paper SoAy: A Service-oriented APIs Applying Framework of Large Language Models☆27Jul 14, 2025Updated 7 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- ☆62Oct 29, 2024Updated last year
- [ICLR 2026] Geometric-Mean Policy Optimization☆100Jan 26, 2026Updated last month
- ☆14Apr 14, 2025Updated 10 months ago
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity (ACL 2025, oral)☆30Jun 14, 2025Updated 8 months ago
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆41Sep 24, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated 11 months ago
- [KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models☆11Apr 9, 2024Updated last year
- ☆14Jan 24, 2025Updated last year
- ☆64Jan 12, 2026Updated last month
- Awesome LLM for NLG Evaluation Papers☆25Jan 23, 2024Updated 2 years ago
- ☆11May 18, 2025Updated 9 months ago
- ☆12Apr 25, 2024Updated last year
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated 3 weeks ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Sep 28, 2025Updated 5 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- The original Shared Recurrent Memory Transformer implementation☆33Jul 11, 2025Updated 7 months ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated 8 months ago
- ☆15Apr 11, 2024Updated last year
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models☆66Mar 8, 2025Updated 11 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- ☆46Sep 27, 2025Updated 5 months ago
- ☆62May 13, 2025Updated 9 months ago
- [EMNLP2024] Aligning Large Language Models on Information Extraction☆54Nov 4, 2024Updated last year
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆97Feb 21, 2025Updated last year
- ☆27May 20, 2025Updated 9 months ago
- ☆23May 21, 2025Updated 9 months ago
- ☆12Dec 20, 2024Updated last year
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders☆18May 23, 2025Updated 9 months ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- [AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models☆20Aug 1, 2025Updated 7 months ago
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆21Jun 15, 2025Updated 8 months ago
- ☆41Feb 22, 2026Updated last week
- o1 Chain of Thought Examples☆33Oct 4, 2024Updated last year
- xKV: Cross-Layer SVD for KV-Cache Compression☆45Nov 30, 2025Updated 3 months ago
- A comprehensive and efficient long-context model evaluation framework☆31Feb 25, 2026Updated last week