DFKI-NLP / LLMCheckup
Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)
☆13Updated 9 months ago
Alternatives and similar repositories for LLMCheckup:
Users that are interested in LLMCheckup are comparing it to the libraries listed below
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆53Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆124Updated 10 months ago
- Repository for the ACL 2024 conference website☆17Updated 3 months ago
- The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Gen…☆60Updated last year
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆76Updated 9 months ago
- ☆64Updated 11 months ago
- Token-level Reference-free Hallucination Detection☆93Updated last year
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆30Updated last month
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆40Updated last year
- ☆38Updated 7 months ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆83Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆82Updated 5 months ago
- ☆37Updated 6 months ago
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆50Updated last year
- ☆35Updated last year
- Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency☆34Updated this week
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆30Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆115Updated 4 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 9 months ago
- A extension of Transformers library to include T5ForSequenceClassification class.☆37Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆65Updated 2 years ago
- distilled Self-Critique refines the outputs of a LLM with only synthetic data☆11Updated 9 months ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆45Updated 3 months ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆21Updated last month
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆55Updated 8 months ago
- ☆81Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 6 months ago
- Repository to create CCKGs from the paper "Similarity-weighted Construction of Contextualized Commonsense Knowledge Graphs for Knowledge-…☆11Updated last year