DFKI-NLP / LLMCheckup

Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)

☆13

Alternatives and similar repositories for LLMCheckup:

Users that are interested in LLMCheckup are comparing it to the libraries listed below

OSU-NLP-Group / AttrScore
Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
☆53Updated last year
chaitanyamalaviya / ExpertQA
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆124Updated 10 months ago
acl-org / acl-2024
Repository for the ACL 2024 conference website
☆17Updated 3 months ago
epfl-dlab / SynthIE
The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Gen…
☆60Updated last year
hitz-zentroa / lm-contamination
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆76Updated 9 months ago
abhika-m / FAVA
☆64Updated 11 months ago
microsoft / HaDes
Token-level Reference-free Hallucination Detection
☆93Updated last year
neulab / data-agora
[arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"
☆30Updated last month
ryokamoi / wice
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
☆40Updated last year
yuxiaw / OpenFactCheck
☆38Updated 7 months ago
yuxiaw / Factcheck-GPT
Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.
☆83Updated last year
McGill-NLP / instruct-qa
Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
☆82Updated 5 months ago
primeqa / clapnq
☆37Updated 6 months ago
yzjiao / On-Demand-IE
Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
☆50Updated last year
Tiiiger / benchmark_llm_summarization
☆35Updated last year
intuit / sac3
Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
☆34Updated this week
project-miracl / hagrid
A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution
☆30Updated last year
nlp-uoregon / mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
☆115Updated 4 months ago
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆59Updated 9 months ago
osainz59 / t5-encoder
A extension of Transformers library to include T5ForSequenceClassification class.
☆37Updated last year
eladsegal / strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
☆65Updated 2 years ago
vicgalle / distilled-self-critique
distilled Self-Critique refines the outputs of a LLM with only synthetic data
☆11Updated 9 months ago
zhudotexe / fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)
☆45Updated 3 months ago
Betswish / MIRAGE
Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/
☆21Updated last month
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆57Updated last year
kasnerz / tabgenie
A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.
☆55Updated 8 months ago
GXimingLu / neurologic_decoding
☆81Updated last year
xu1998hz / InstructScore_SEScore3
First explanation metric (diagnostic report) for text generation evaluation
☆62Updated 6 months ago
Heidelberg-NLP / CCKG
Repository to create CCKGs from the paper "Similarity-weighted Construction of Contextualized Commonsense Knowledge Graphs for Knowledge-…
☆11Updated last year