GAIR-NLP / auto-jLinks
Generative Judge for Evaluating Alignment
β250Updated 2 years ago
Alternatives and similar repositories for auto-j
Users that are interested in auto-j are comparing it to the libraries listed below
Sorting:
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β137Updated 8 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenariosβ73Updated 8 months ago
- β322Updated last year
- Data and Code for Program of Thoughts [TMLR 2023]β303Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)β268Updated last year
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Followingβ136Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuningβ284Updated 2 years ago
- A large-scale, fine-grained, diverse preference dataset (and models).β361Updated 2 years ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels β¦β285Updated 2 years ago
- β143Updated 2 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β101Updated 11 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsβ269Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β132Updated last year
- β147Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]β584Updated last year
- FireAct: Toward Language Agent Fine-tuningβ292Updated 2 years ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ187Updated 7 months ago
- Unofficial implementation of AlpaGasusβ94Updated 2 years ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancementβ193Updated last year
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.β213Updated 9 months ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsβ119Updated 7 months ago
- β51Updated last year
- [ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmarkβ389Updated last year
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"β534Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ200Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β148Updated last year
- Collection of papers for scalable automated alignment.β93Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other moβ¦β415Updated 7 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scaleβ264Updated 6 months ago
- Do Large Language Models Know What They Donβt Know?β102Updated last year