i-Eval / FairEvalLinks
β139Updated last year
Alternatives and similar repositories for FairEval
Users that are interested in FairEval are comparing it to the libraries listed below
Sorting:
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β140Updated 2 months ago
- Do Large Language Models Know What They Donβt Know?β97Updated 8 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Followingβ127Updated last year
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.β131Updated 2 years ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ178Updated last year
- Generative Judge for Evaluating Alignmentβ244Updated last year
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β87Updated 4 months ago
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"β47Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ162Updated 3 weeks ago
- paper list on reasoning in NLPβ190Updated 3 months ago
- Code for "Small Models are Valuable Plug-ins for Large Language Models"β130Updated 2 years ago
- EMNLP'23 survey: a curation of awesome papers and resources on refreshing large language models (LLMs) without expensive retraining.β134Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenariosβ68Updated 2 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Zihaβ¦β126Updated last year
- β284Updated last year
- Scaling Sentence Embeddings with Large Language Modelsβ111Updated last year
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don'tβ¦β114Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.β50Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"β70Updated last year
- β278Updated 6 months ago
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"β111Updated 2 years ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsβ107Updated last month
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"β69Updated 11 months ago
- Code for ACL2023 paper: Pre-Training to Learn in Contextβ107Updated 11 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?β81Updated last year
- β66Updated 3 years ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"β252Updated 2 years ago
- Data and Code for Program of Thoughts [TMLR 2023]β279Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β145Updated 8 months ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"β101Updated 2 years ago