GAIR-NLP / auto-jLinks
Generative Judge for Evaluating Alignment
☆238Updated last year
Alternatives and similar repositories for auto-j
Users that are interested in auto-j are comparing it to the libraries listed below
Sorting:
- ☆282Updated 10 months ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆263Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆554Updated 5 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆140Updated 3 weeks ago
- FireAct: Toward Language Agent Fine-tuning☆278Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆340Updated last year
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆264Updated 8 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆261Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆124Updated 11 months ago
- Data and Code for Program of Thoughts (TMLR 2023)☆274Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆368Updated 8 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios