[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
☆40Jul 19, 2024Updated last year
Alternatives and similar repositories for KIEval
Users that are interested in KIEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Aug 3, 2024Updated last year
- ☆19May 25, 2024Updated 2 years ago
- Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction☆24Sep 30, 2022Updated 3 years ago
- ☆17Feb 28, 2024Updated 2 years ago
- ☆68Feb 1, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- We have developed Symbol Demonstration Direct Preference Optimization (SymDPO) and validating its effectiveness across multiple benchmark…☆23Nov 22, 2024Updated last year
- Code and Data for GlitchBench☆13Feb 27, 2024Updated 2 years ago
- The repository for paper <Evaluating Open-QA Evaluation>☆25Apr 9, 2024Updated 2 years ago
- Code for Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack(TPAMI 2025)☆42Aug 28, 2025Updated 9 months ago
- ☆923May 22, 2024Updated 2 years ago
- ☆31Jun 12, 2024Updated last year
- ☆19Feb 3, 2022Updated 4 years ago
- ☆23Jan 25, 2023Updated 3 years ago
- ☆468Feb 7, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 🎉 TrustJudge is accepted to ICLR 2026!☆46Sep 27, 2025Updated 8 months ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆26Mar 4, 2025Updated last year
- playing with gpt4☆13Mar 17, 2023Updated 3 years ago
- Conversational Recommender System Evaluation via Simulation☆19Updated this week
- Code for COLING 2022 paper "FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition"☆15Jan 15, 2023Updated 3 years ago
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated 2 years ago
- ☆12Jan 20, 2025Updated last year
- ☆12Jun 29, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 11 months ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated last year
- Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedbac…☆61Mar 25, 2026Updated 2 months ago
- ☆14Aug 30, 2023Updated 2 years ago
- ☆51Oct 24, 2023Updated 2 years ago
- Leveraging ChatGPT for Text Data Augmentation☆54Sep 21, 2024Updated last year
- Detect and defend against the nonce race exploit on Polymarket's CTF Exchange☆59Mar 17, 2026Updated 2 months ago
- ☆16Dec 14, 2023Updated 2 years ago
- Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"☆43May 20, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.☆20Dec 25, 2023Updated 2 years ago
- Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆18Oct 7, 2025Updated 7 months ago
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- Clean, extensible implementation of MACAW [ICML 2021]☆12Dec 7, 2021Updated 4 years ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆99Jan 29, 2024Updated 2 years ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆58May 28, 2025Updated last year