OATML / semantic-entropy-probes
☆27Updated 8 months ago
Alternatives and similar repositories for semantic-entropy-probes:
Users that are interested in semantic-entropy-probes are comparing it to the libraries listed below
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆108Updated last year
- Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).☆55Updated last year
- ☆87Updated 9 months ago
- ☆48Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆53Updated 5 months ago
- Evaluate the Quality of Critique☆34Updated 10 months ago
- ☆164Updated 10 months ago
- ☆50Updated 2 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- ☆93Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆115Updated last year
- ☆82Updated 8 months ago
- ☆40Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆104Updated 6 months ago
- ☆68Updated 3 months ago
- ☆36Updated 3 months ago
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆113Updated last year
- AbstainQA, ACL 2024☆25Updated 6 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆37Updated 3 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated last month
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆57Updated 4 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆161Updated last week
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 5 months ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆21Updated 4 months ago
- PASTA: Post-hoc Attention Steering for LLMs☆114Updated 5 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆40Updated last week
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Restore safety in fine-tuned language models through task arithmetic☆28Updated last year
- [NeurIPS 2024] How do Large Language Models Handle Multilingualism?☆32Updated 5 months ago
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆88Updated 2 months ago