felixbinder / introspection_self_predictionView external linksLinks
Code for experiments on self-prediction as a way to measure introspection in LLMs
☆16Dec 10, 2024Updated last year
Alternatives and similar repositories for introspection_self_prediction
Users that are interested in introspection_self_prediction are comparing it to the libraries listed below
Sorting:
- ☆24Dec 8, 2024Updated last year
- NeurIPS'24 - LLM Safety Landscape☆39Oct 21, 2025Updated 3 months ago
- Code repo for the model organisms and convergent directions of EM papers.☆49Sep 22, 2025Updated 4 months ago
- ☆34Feb 20, 2025Updated 11 months ago
- ☆35May 21, 2025Updated 8 months ago
- FeatureAlignment = Alignment + Mechanistic Interpretability☆34Mar 8, 2025Updated 11 months ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- Supporing code for the paper "Bayesian Model Selection, the Marginal Likelihood, and Generalization".☆36Jun 16, 2022Updated 3 years ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 10 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆34Aug 15, 2024Updated last year
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38May 24, 2024Updated last year
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- [KDD'23] This is the code repo for our KDD'23 paper "DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling".☆11Jun 14, 2023Updated 2 years ago
- Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)☆13Nov 25, 2024Updated last year
- my profile readme☆14Updated this week
- ☆12Jul 8, 2024Updated last year
- ☆11Nov 10, 2020Updated 5 years ago
- The course work repo for UoSurrey EEEM071 (2023 Spring)☆11May 9, 2023Updated 2 years ago
- A high performance I/O library for deep learning in Julia, based on the PyTorch WebDataset library☆14Dec 18, 2025Updated last month
- Predicting the Stock Market - Can we do it?☆10Jul 24, 2021Updated 4 years ago
- Unofficial implementation of "Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle"☆13Jul 3, 2024Updated last year
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 8 months ago
- A Zen approach to configuring your Python project☆15Feb 5, 2026Updated last week
- ☆14May 21, 2024Updated last year
- ☆17May 3, 2025Updated 9 months ago
- A Java-based framework for combinatorial test input generation, fault characterization and automated test execution.☆11Jan 22, 2024Updated 2 years ago
- 1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu☆29Sep 8, 2025Updated 5 months ago
- 🤖 Implementation of Self Normalizing Networks (SNN) in PyTorch.☆12Jun 19, 2017Updated 8 years ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 6 months ago
- Python package for compressing floating-point PyTorch tensors☆13Jul 22, 2024Updated last year
- Official Repo of Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents☆58Oct 28, 2025Updated 3 months ago
- ☆13Aug 7, 2024Updated last year
- A Benchmark for Multi-Stage Legal Case Documents Generation☆14Feb 24, 2025Updated 11 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion☆11Apr 1, 2024Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated last month
- ☆16Mar 22, 2025Updated 10 months ago