☆41Jul 6, 2025Updated 8 months ago
Alternatives and similar repositories for Unsupervised-Elicitation
Users that are interested in Unsupervised-Elicitation are comparing it to the libraries listed below
Sorting:
- ☆21Jun 22, 2025Updated 8 months ago
- ☆20Nov 15, 2024Updated last year
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- ☆20May 25, 2024Updated last year
- Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.☆24Mar 27, 2025Updated 11 months ago
- Pytorch implementation for paper "Texturify: Generating Textures on 3D Shape Surfaces"☆16Dec 17, 2022Updated 3 years ago
- (ICLR25 Oral) Do as We Do, Not as You Think: the Conformity of Large Language Models☆40Feb 6, 2026Updated last month
- The loss landscape of Large Language Models resemble basin!☆36Jul 8, 2025Updated 7 months ago
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆29Jun 4, 2024Updated last year
- Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses☆29Feb 11, 2025Updated last year
- Official code repository for Correct-N-Contrast☆23Jul 18, 2022Updated 3 years ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆191Jan 16, 2025Updated last year
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆30Apr 13, 2023Updated 2 years ago
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- Source code of paper "An Unforgeable Publicly Verifiable Watermark for Large Language Models" accepted by ICLR 2024☆34May 23, 2024Updated last year
- The implement of paper:"ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability"☆63Jun 3, 2025Updated 9 months ago
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆42Oct 3, 2025Updated 5 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- Training project about Deep Learing☆12Jun 22, 2017Updated 8 years ago
- my profile readme☆14Updated this week
- ☆11Jun 18, 2023Updated 2 years ago
- ☆12Dec 14, 2022Updated 3 years ago
- Code for the paper "Semi-Conditional Normalizing Flows for Semi-Supervised Learning"☆11Mar 30, 2020Updated 5 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- ☆12Jul 8, 2024Updated last year
- on-policy optimization baselines for deep reinforcement learning☆32Apr 3, 2020Updated 5 years ago
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- Implementation of Stochastic Gradient Descent algorithms in Python (cite https://doi.org/10.1007/s00158-020-02599-z)☆11May 19, 2021Updated 4 years ago
- 1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu☆29Sep 8, 2025Updated 5 months ago
- [ICML 2022 Spotlight] Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks☆11May 21, 2023Updated 2 years ago
- ☆11Feb 22, 2020Updated 6 years ago
- ☆15Aug 19, 2025Updated 6 months ago
- Minimilast Redis Client for Erlang☆19Jul 15, 2013Updated 12 years ago
- Python package for compressing floating-point PyTorch tensors☆13Jul 22, 2024Updated last year
- ☆10Mar 6, 2022Updated 4 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- ☆15Mar 13, 2025Updated 11 months ago
- ☆13Feb 14, 2022Updated 4 years ago
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 9 months ago