A mechanistic approach for understanding and detecting factual errors of large language models.
☆49Jul 6, 2024Updated last year
Alternatives and similar repositories for mechanistic-error-probe
Users that are interested in mechanistic-error-probe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected…☆23Nov 9, 2020Updated 5 years ago
- Some numerical optimization methods implemented in Haskell☆47Jun 24, 2020Updated 5 years ago
- 👁️ Isometric 3D Graphing / Rendering module for Haskell☆15Sep 2, 2017Updated 8 years ago
- ☆10Nov 1, 2022Updated 3 years ago
- ☆10Nov 8, 2022Updated 3 years ago
- ☆12Oct 25, 2023Updated 2 years ago
- Shared code for training sentence embeddings with Flax / JAX☆28Jul 15, 2021Updated 4 years ago
- ☆16Sep 27, 2023Updated 2 years ago
- Comparative Analysis of Graph Neural Networks for Node Regression task on Wiki-Squirrel dataset (Bachelor's Research Project)☆12Nov 6, 2025Updated 4 months ago
- ☆21Mar 19, 2024Updated 2 years ago
- A Dataset and Results for Classifying Emotions Across Languages☆10Jun 20, 2021Updated 4 years ago
- Belief in the Machine: Investigating Epistemological Blind Spots of Language Models☆32Apr 19, 2025Updated 11 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- [ICASSP 2022] Official PyTorch Implementation for "Attention Probe: Vision Transformer Distillation in the Wild" (ICASSP 2022)☆11Jan 23, 2022Updated 4 years ago
- ☆10Jul 13, 2024Updated last year
- ☆15Apr 26, 2025Updated 10 months ago
- ☆23Jun 13, 2024Updated last year
- Automatically modelling and distilling knowledge within AI. In other words, summarising the AI research firehose.☆24Mar 15, 2019Updated 7 years ago
- Official implementation of UnifiedReward & UnifiedReward-Think☆18Jun 18, 2025Updated 9 months ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- Code for the paper "Understanding RL Vision"☆51Apr 2, 2023Updated 2 years ago
- Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge☆17Nov 16, 2021Updated 4 years ago
- Simple phoenix setup for padded window management☆13Apr 25, 2018Updated 7 years ago
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".☆15Apr 27, 2023Updated 2 years ago
- On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))☆13Nov 21, 2021Updated 4 years ago
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- ☆10Apr 14, 2021Updated 4 years ago
- ☆20Apr 12, 2024Updated last year
- Open-source library for scalable, reproducible evaluation of AI models and benchmarks.☆240Updated this week
- A Domain-Specific Language, Jailbreak Attack Synthesizer and Dynamic LLM Redteaming Toolkit☆27Dec 5, 2024Updated last year
- Python package to compute interaction indices that extend the Shapley Value. AISTATS 2023.☆19Sep 25, 2023Updated 2 years ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆20Feb 23, 2026Updated last month
- LSTM-VAE for Time Series Anomaly Detection☆10Feb 21, 2021Updated 5 years ago
- Code for the paper "Symmetric Machine Theory of Mind", presented at ICML 2022.☆12Jul 18, 2022Updated 3 years ago
- This repo contains the ToMnet+ model for preference inference. Developed by Yun-Shiuan, Edwinn, Hsin-Yi, and Elaine.☆10Feb 24, 2023Updated 3 years ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated last year
- Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends [NeurIPS 2023]☆10Jan 28, 2024Updated 2 years ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆36Dec 17, 2024Updated last year
- [WSDM 2025] Source code for "Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Cali…☆14Oct 14, 2025Updated 5 months ago