ModalityDance / Awesome-Agent-as-a-JudgeLinks
"A Survey on Agent-as-a-Judge"
☆21Updated last week
Alternatives and similar repositories for Awesome-Agent-as-a-Judge
Users that are interested in Awesome-Agent-as-a-Judge are comparing it to the libraries listed below
Sorting:
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆52Updated 7 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆64Updated 2 months ago
- Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation☆55Updated 3 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆69Updated 10 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 5 months ago
- Exploration of automated dataset selection approaches at large scales.☆53Updated 10 months ago
- ☆107Updated last month
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆71Updated last year
- ☆214Updated 7 months ago
- RewardAnything: Generalizable Principle-Following Reward Models☆45Updated 7 months ago
- ☆53Updated 11 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆60Updated 7 months ago
- This the implementation of LeCo☆31Updated 11 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆210Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Updated 10 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆65Updated last year
- DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems☆62Updated last year
- ☆75Updated last year
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆48Updated last year
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Updated 5 months ago
- ☆58Updated 2 months ago
- ☆50Updated 11 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆57Updated 11 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 5 months ago
- [ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆48Updated 3 weeks ago
- Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545☆50Updated last year
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆124Updated last year
- Benchmarking Benchmark Leakage in Large Language Models☆58Updated last year
- ☆140Updated 10 months ago
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆20Updated 3 months ago