[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
☆39Jul 19, 2024Updated last year
Alternatives and similar repositories for KIEval
Users that are interested in KIEval are comparing it to the libraries listed below
Sorting:
- ☆19May 25, 2024Updated last year
- ☆16Feb 28, 2024Updated 2 years ago
- Code and Data for GlitchBench☆13Feb 27, 2024Updated 2 years ago
- ☆31Jun 12, 2024Updated last year
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 6 months ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated last year
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 9 months ago
- Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization☆22Mar 12, 2025Updated 11 months ago
- ☆23Jan 25, 2023Updated 3 years ago
- ☆19Feb 3, 2022Updated 4 years ago
- The repository for paper <Evaluating Open-QA Evaluation>☆25Apr 9, 2024Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57May 28, 2025Updated 9 months ago
- ☆32Jul 11, 2024Updated last year
- [AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research☆30Aug 6, 2024Updated last year
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆25Feb 1, 2024Updated 2 years ago
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- Oak National Academy's AI Auto Eval tools provide LLM as a judge evaluation on lesson plans and resources☆17Nov 4, 2025Updated 4 months ago
- ☆922May 22, 2024Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated 2 years ago
- ☆37Jan 26, 2025Updated last year
- ☆33Jun 24, 2024Updated last year
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- Google Chrome Extension for recording Google Meet transcripts☆12Aug 6, 2020Updated 5 years ago
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- Evaluating LLMs with fewer examples☆169Apr 12, 2024Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- ☆11Aug 22, 2022Updated 3 years ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- ☆12Jan 11, 2026Updated last month
- ☆43Oct 7, 2024Updated last year
- [ICLR 2024] Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks☆46Feb 20, 2024Updated 2 years ago
- ☆10Oct 22, 2024Updated last year
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10May 6, 2024Updated last year
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆15Jun 3, 2023Updated 2 years ago
- Survey of available speech datasets for Polish ASR development☆17Jan 1, 2025Updated last year
- ☆11Jan 3, 2024Updated 2 years ago
- ☆12Nov 5, 2024Updated last year
- This project aims at giving the best customer service ever using the power of LLM models like GPT.☆10Jun 29, 2023Updated 2 years ago