yinzhangyue / SelfAwareLinks

Do Large Language Models Know What They Don’t Know?

☆102

Alternatives and similar repositories for SelfAware

Users that are interested in SelfAware are comparing it to the libraries listed below

Sorting:

edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆55Updated last year
hongbinye / Cognitive-Mirage-Hallucinations-in-LLMs
Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"
☆47Updated 2 years ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆132Updated 2 years ago
shizhediao / R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…
☆125Updated last year
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆134Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
GAIR-NLP / alignment-for-honesty
☆76Updated last year
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆84Updated last year
OSU-NLP-Group / LLM-Knowledge-Conflict
[ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"
☆78Updated last year
thu-coai / ComplexBench
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆97Updated 9 months ago
RUCAIBox / HaluEval-2.0
☆47Updated last year
i-Eval / FairEval
☆142Updated 2 years ago
Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆138Updated 6 months ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆119Updated last year
qinyiwei / InfoBench
☆57Updated last year
ZeroYuHuang / Transformer-Patcher
☆32Updated 2 years ago
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆193Updated last year
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
icip-cas / awesome-auto-alignment
Collection of papers for scalable automated alignment.
☆94Updated last year
liyucheng09 / Contamination_Detector
Lightweight tool to identify Data Contamination in LLMs evaluation
☆52Updated last year
c-box / KnowledgeLifecycle
Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"
☆59Updated 2 years ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆76Updated last year
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 10 months ago
YuxiXie / SelfEval-Guided-Decoding
☆103Updated last year
csitfun / LogiQA2.0
Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks
☆100Updated 2 years ago
JoeYing1019 / UltraTool
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
☆66Updated 3 months ago
YJiangcm / FollowBench
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆117Updated 5 months ago
google-research-datasets / GSM-IC
Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…
☆64Updated 2 years ago
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆61Updated last year
nayeon7lee / FactualityPrompt
☆86Updated 3 years ago