jam3scampbell / llama-lyingView external linksLinks
Code for our paper "Localizing Lying in Llama"
☆13Apr 24, 2025Updated 9 months ago
Alternatives and similar repositories for llama-lying
Users that are interested in llama-lying are comparing it to the libraries listed below
Sorting:
- ☆10Oct 31, 2022Updated 3 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 7 months ago
- A python sdk for LLM finetuning and inference on runpod infrastructure☆17Updated this week
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 2 years ago
- ☆12Jul 11, 2021Updated 4 years ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆24Feb 6, 2025Updated last year
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- The official code for ICML 2024 "FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Recons…☆29Jun 6, 2024Updated last year
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Jul 11, 2023Updated 2 years ago
- Decoder only transformer, built from scratch with PyTorch☆32Oct 22, 2023Updated 2 years ago
- Code to break Llama Guard☆32Dec 7, 2023Updated 2 years ago
- The public web API of the National Museum of Australia☆11Sep 12, 2023Updated 2 years ago
- Completely remove Gemini’s SynthID security so it can’t detect that your image was made with AI. Simply clone the repository locally, run…☆22Dec 10, 2025Updated 2 months ago
- A machine learning model to recommend movies & tv series☆11Oct 26, 2020Updated 5 years ago
- AlgZoo: uninterpreted models with fewer than 1,500 parameters☆40Jan 19, 2026Updated 3 weeks ago
- ☆12Jul 8, 2024Updated last year
- Run a raffle among the 🌟 stargazers 🌟 of a Github project!☆11Mar 23, 2023Updated 2 years ago
- Library on Arduino to add over the air (OTA) Update Capabilities to bw16/rtl8720DN☆11Aug 6, 2024Updated last year
- Streaming effects for PureScript☆16Nov 8, 2021Updated 4 years ago
- Provide a simple, script-friendly interface to posting stuff on matrix channels☆10Mar 9, 2018Updated 7 years ago
- Repository for my dotfiles☆12Dec 3, 2025Updated 2 months ago
- ☆13Dec 16, 2024Updated last year
- Kaggle Competition : IEEE-CIS-Fraud-Detection☆10Jan 18, 2020Updated 6 years ago