zhuohaoyu / KIEvalView external linksLinks
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
☆39Jul 19, 2024Updated last year
Alternatives and similar repositories for KIEval
Users that are interested in KIEval are comparing it to the libraries listed below
Sorting:
- ☆19Aug 3, 2024Updated last year
- Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction☆24Sep 30, 2022Updated 3 years ago
- ☆16Feb 28, 2024Updated last year
- Code and Data for GlitchBench☆13Feb 27, 2024Updated last year
- ☆31Jun 12, 2024Updated last year
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 5 months ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆21Mar 4, 2025Updated 11 months ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 8 months ago
- ☆19Feb 3, 2022Updated 4 years ago
- ☆22Jan 25, 2023Updated 3 years ago
- Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization☆22Mar 12, 2025Updated 11 months ago
- The repository for paper <Evaluating Open-QA Evaluation>☆25Apr 9, 2024Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57May 28, 2025Updated 8 months ago
- ☆32Jul 11, 2024Updated last year
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆25Feb 1, 2024Updated 2 years ago
- [AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research☆30Aug 6, 2024Updated last year
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- Oak National Academy's AI Auto Eval tools provide LLM as a judge evaluation on lesson plans and resources☆17Nov 4, 2025Updated 3 months ago
- Official code for ICLR 2024 paper "Do Generated Data Always Help Contrastive Learning?"☆31Apr 4, 2024Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated last year
- ☆36Jan 26, 2025Updated last year
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- Deep Transfer Learning codes using Google TensorFlow☆13Apr 4, 2016Updated 9 years ago
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- Evaluating LLMs with fewer examples☆169Apr 12, 2024Updated last year
- ☆12Jan 11, 2026Updated last month
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated 11 months ago
- ☆11Aug 22, 2022Updated 3 years ago
- ⚡ Running online SfM 🌐 while capturing images 📸☆27Sep 27, 2025Updated 4 months ago
- A longitudinal dataset for academic literature, including papers, metadata, and citation graphs, Also available on 🤗 HuggingFace and Kag…☆16Sep 6, 2025Updated 5 months ago
- [KDD'22] Partial Label Learning with Discrimination Augmentation☆10May 21, 2024Updated last year
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- ☆43Oct 7, 2024Updated last year
- [ICLR 2024] Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks☆46Feb 20, 2024Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- 🎉 TrustJudge is accepted to ICLR 2026!☆38Sep 27, 2025Updated 4 months ago
- ☆12Mar 5, 2025Updated 11 months ago
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago