☆34Nov 7, 2024Updated last year
Alternatives and similar repositories for Truth_is_Universal
Users that are interested in Truth_is_Universal are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆107Aug 8, 2024Updated last year
- Improving transparency of large language models' reasoning☆15Nov 25, 2025Updated 5 months ago
- ☆92Jan 22, 2025Updated last year
- Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""☆19Oct 11, 2024Updated last year
- A series of BERT and Albert model checkpoints trained to reduce gendered correlations in pre-training☆11Oct 22, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Creating a game to play Figgie & Train an agent to play against☆15Dec 3, 2022Updated 3 years ago
- ☆32Nov 16, 2025Updated 5 months ago
- Code for "On Measuring Faithfulness of Natural Language Explanations"☆22Jul 23, 2024Updated last year
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆72Jun 19, 2024Updated last year
- Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-tur…☆25Dec 3, 2024Updated last year
- ☆17Feb 26, 2024Updated 2 years ago
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024☆12May 24, 2024Updated last year
- Code for "Astraea: Grammar-based Fairness Testing"☆10Jan 7, 2022Updated 4 years ago
- ☆16Apr 27, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Explore, Establish, Exploit: Red Teaming Language Models from Scratch☆15Jun 21, 2023Updated 2 years ago
- Intersectional bias in hate speech and abusive language datasets☆15Jan 25, 2024Updated 2 years ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆65Oct 27, 2024Updated last year
- This repo for the paper titled "SC-MIL: Sparsely Coded Multiple Instance Learning for Whole Slide Image Classification"☆12Apr 25, 2024Updated 2 years ago
- LoFiT: Localized Fine-tuning on LLM Representations☆45Jan 15, 2025Updated last year
- ☆18May 18, 2021Updated 4 years ago
- Sparse probing paper full code.☆67Dec 17, 2023Updated 2 years ago
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 10 months ago
- ☆20Aug 26, 2018Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Data set for LREC 2020 paper "I Feel Offended, Don't Be Abusive!"☆19Sep 23, 2023Updated 2 years ago
- 😇A curated list of links and resources for Fair ML and Data Ethics☆18May 31, 2022Updated 3 years ago
- Discriminative Feature Selection via A Structured Sparse Subspace Learning Module☆12Apr 15, 2022Updated 4 years ago
- ☆95Mar 13, 2026Updated last month
- Learning Certified Individually Fair Representations☆24Nov 7, 2020Updated 5 years ago
- Interpretable unified language safety checking with large language models☆32Apr 15, 2023Updated 3 years ago
- My personal CV in a git-inspired style. Check it out!☆10Mar 29, 2024Updated 2 years ago
- Establishing Quantified Uncertainty in Neural Networks☆15Apr 13, 2026Updated 3 weeks ago
- ☆12Mar 19, 2021Updated 5 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆16Oct 18, 2024Updated last year
- ☆18Aug 19, 2024Updated last year
- [ECCV'24] cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process☆17Sep 10, 2024Updated last year
- ☆29Feb 24, 2025Updated last year
- 2 Hour Project: iOS app to create custom app icons inspired by the viral AI-generated icons trend. Upload your home screen, choose a them…☆16Sep 3, 2024Updated last year
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- minimalistic AI library that resembles HF's transformers☆13Dec 31, 2024Updated last year