felixbinder/introspection_self_prediction

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/felixbinder/introspection_self_prediction)

felixbinder / introspection_self_prediction

Code for experiments on self-prediction as a way to measure introspection in LLMs

☆16

Alternatives and similar repositories for introspection_self_prediction

Users that are interested in introspection_self_prediction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UKGovernmentBEIS / as-evaluation-standard
View on GitHub
A repository that holds templates, examples, and tests to help external parties submit tasks to AISI that conform with the Autonomous Sys…
☆11Jan 16, 2026Updated 6 months ago
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
QAMPspring2023 / qgpt-issue-31
View on GitHub
qgpt-issue-31
☆11Oct 31, 2024Updated last year
aladinD / SafeMERGE
View on GitHub
Code for SafeMERGE (ICLR 2025).
☆15Apr 1, 2025Updated last year
pbevan1 / Detecting-Melanoma-Fairly
View on GitHub
Implementation for MICCAI DART paper: 'Detecting Melanoma Fairly: Skin Tone Detection and Debiasing for Skin Lesion Classification'
☆18Jun 22, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
1oannis / budget-lens
View on GitHub
The open-source receipt scanner & expense tracker made for self-hosting
☆17Jul 17, 2025Updated last year
choidami / inductive-oocr
View on GitHub
☆16Mar 22, 2025Updated last year
RUCAIBox / FIGA
View on GitHub
[ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"
☆10May 5, 2024Updated 2 years ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
GAIR-NLP / BeHonest
View on GitHub
BeHonest: Benchmarking Honesty in Large Language Models
☆35Aug 15, 2024Updated last year
AsaCooperStickland / situational-awareness-evals
View on GitHub
Measuring the situational awareness of language models
☆41Feb 12, 2024Updated 2 years ago
jaechan-repo / muse_bench
View on GitHub
☆33Aug 9, 2024Updated last year
FartyPants / VirtualLora
View on GitHub
extension for text WebUI
☆20Aug 7, 2025Updated 11 months ago
DFRobot / DFRobot_AS7341
View on GitHub
We live in a colorful world, but how much do you really know about color? You eyes may deceive you, while the sensors don’t lie. This AS7…
☆12Jan 20, 2022Updated 4 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Tomorrowdawn / top_nsigma
View on GitHub
The official code repo and data hub of top_nsigma sampling strategy for LLMs.
☆26Feb 11, 2025Updated last year
alexa / places
View on GitHub
This is the code for our paper: PLACES: Prompting Language Models for Social Conversation Synthesis
☆11Feb 17, 2023Updated 3 years ago
night-chen / DyGen
View on GitHub
[KDD'23] This is the code repo for our KDD'23 paper "DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling".
☆11Jun 14, 2023Updated 3 years ago
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
GeorgeDavila / handwriting2website
View on GitHub
Automatically turn your handwritten journal entries into a website using GPT3 OCR python and html
☆13Dec 15, 2021Updated 4 years ago
apartresearch / DarkBench
View on GitHub
Benchmarking Dark Patterns in LLMs (ICLR 2025)
☆18Mar 29, 2025Updated last year
rtaori / data_feedback
View on GitHub
Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"
☆18Sep 9, 2022Updated 3 years ago
UCSC-REAL / FLAT
View on GitHub
[ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data
☆14Feb 26, 2025Updated last year
XuchanBao / behavioral-self-awareness
View on GitHub
☆37Feb 20, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MikaStars39 / FeatureAlignment
View on GitHub
FeatureAlignment = Alignment + Mechanistic Interpretability
☆35Mar 8, 2025Updated last year
yaolu-zjut / DDInterpreter
View on GitHub
☆15May 28, 2024Updated 2 years ago
Sanaelotfi / Bayesian_model_comparison
View on GitHub
Supporing code for the paper "Bayesian Model Selection, the Marginal Likelihood, and Generalization".
☆37Jun 16, 2022Updated 4 years ago
an-tran528 / wavetransformer
View on GitHub
Code base for WaveTransformer: A novel architecture for automated audio captioning
☆43Mar 1, 2021Updated 5 years ago
Anglebrackets / web_rag
View on GitHub
Oobabooga Text-Gen Web UI extension: get web content, add to context
☆22Jun 1, 2024Updated 2 years ago
sweetpeach / hummingbird
View on GitHub
Code and Hummingbird dataset for EMNLP 2021 paper "Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica"
☆14Apr 13, 2022Updated 4 years ago
genglinliu / UnknownBench
View on GitHub
Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
☆14Feb 20, 2024Updated 2 years ago
aisa-group / decomposing-eval-awareness
View on GitHub
Decomposing and measuring evaluation awareness in existing benchmarks and our proposed EvalAwareBench.
☆19Jun 1, 2026Updated last month
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lpdkt / arch
View on GitHub
my arch linux dotfiles
☆16Jan 22, 2025Updated last year
eujhwang / personalized-llms
View on GitHub
personalized-llms with allen institute
☆13Jun 22, 2023Updated 3 years ago
scaleapi / propensity-evaluation
View on GitHub
open Source code for propensity evaluation
☆19Apr 25, 2026Updated 3 months ago
poloclub / llm-landscape
View on GitHub
NeurIPS'24 - LLM Safety Landscape
☆40Oct 21, 2025Updated 9 months ago
lil-lab / cb2
View on GitHub
An NLP research and data collection platform.
☆17Jul 4, 2026Updated 3 weeks ago
MKariya1998 / GMI-Attack
View on GitHub
☆12Nov 10, 2020Updated 5 years ago
AI45Lab / REEF
View on GitHub
The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…
☆79Jan 16, 2025Updated last year