[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
☆39Jul 19, 2024Updated last year
Alternatives and similar repositories for KIEval
Users that are interested in KIEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19May 25, 2024Updated last year
- Improving fast adversarial training with prior-guided knowledge (TPAMI2024)☆43Apr 21, 2024Updated last year
- Code and Data for GlitchBench☆13Feb 27, 2024Updated 2 years ago
- The repository for paper <Evaluating Open-QA Evaluation>☆25Apr 9, 2024Updated 2 years ago
- ☆920May 22, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)☆144Apr 7, 2025Updated last year
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 8 months ago
- ☆31Jun 12, 2024Updated last year
- [KDD'22] Partial Label Learning with Discrimination Augmentation☆10May 21, 2024Updated last year
- ☆19Feb 3, 2022Updated 4 years ago
- 🎉 TrustJudge is accepted to ICLR 2026!☆46Sep 27, 2025Updated 6 months ago
- ☆32Jul 11, 2024Updated last year
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆26Mar 4, 2025Updated last year
- playing with gpt4☆14Mar 17, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Conversational Recommender System Evaluation via Simulation☆19Updated this week
- ☆12Jan 20, 2025Updated last year
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆93Aug 15, 2023Updated 2 years ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 9 months ago
- ☆49Oct 24, 2023Updated 2 years ago
- ☆40Jan 26, 2025Updated last year
- Leveraging ChatGPT for Text Data Augmentation☆54Sep 21, 2024Updated last year
- JPEG-LM: LLMs as Image Generators with Canonical Codec Representations☆15Sep 29, 2024Updated last year
- Code listing for the paper 'SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detec…☆10Nov 1, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The code implementation of the paper Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks (A…☆13Jul 16, 2024Updated last year
- Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"☆43May 20, 2025Updated 10 months ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆18Oct 7, 2025Updated 6 months ago
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- ☆26Dec 11, 2025Updated 4 months ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆98Jan 29, 2024Updated 2 years ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57May 28, 2025Updated 10 months ago
- This repository includes the code implementation of the paper Improving Pacing in Long-Form Story Planning by Yichen Wang, Kevin Yang, Xi…☆17Nov 19, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Repository for the ACL 2024 conference website☆18Feb 3, 2025Updated last year
- Code for CVPR2018 "Iterative Learning with Open-set Noisy Labels"☆12Mar 12, 2021Updated 5 years ago
- ☆48Sep 5, 2024Updated last year
- Official Code for ICLR 2023 Paper: A Message Passing Perspective on Learning Dynamics of Contrastive Learning☆11Mar 9, 2023Updated 3 years ago
- [IEEE TMI'23] IOP-FL: Inside-Outside Personalization for Federated Medical Image Segmentation☆12Apr 2, 2023Updated 3 years ago
- [AAAI 2024] DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models☆12Dec 5, 2024Updated last year
- Code for ACL-2023 paper "Measuring Consistency in Text-based Financial Forecasting Models"☆16Jan 22, 2024Updated 2 years ago