baaivision / JudgeLM
[ICLR 2025 Spotlight] An open-sourced LLM judge for evaluating LLM-generated answers.
☆349Updated last month
Alternatives and similar repositories for JudgeLM:
Users that are interested in JudgeLM are comparing it to the libraries listed below
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆298Updated 6 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆454Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)☆261Updated 11 months ago
- RewardBench: the first evaluation tool for reward models.☆532Updated last month
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆473Updated 9 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆219Updated 4 months ago
- Generative Representational Instruction Tuning☆613Updated 2 weeks ago
- ☆91Updated last month
- ☆120Updated 9 months ago
- FuseAI Project☆555Updated 2 months ago
- VisualWebArena is a benchmark for multimodal agents.☆320Updated 4 months ago
- Generative Judge for Evaluating Alignment☆232Updated last year
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆203Updated 3 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆264Updated 11 months ago
- ☆504Updated 4 months ago
- All available datasets for Instruction Tuning of Large Language Models☆247Updated last year
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)☆350Updated last week
- Official repository for ORPO☆446Updated 10 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆405Updated 11 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆180Updated last year
- ☆307Updated 9 months ago
- FireAct: Toward Language Agent Fine-tuning☆274Updated last year
- ☆142Updated 11 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆664Updated 2 weeks ago
- 🐙 OctoPack: Instruction Tuning Code Large Language Models☆460Updated last month
- DSIR large-scale data selection framework for language model training☆244Updated 11 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆297Updated 10 months ago
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆622Updated 8 months ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆242Updated 2 years ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆350Updated 6 months ago