Code for our paper "Localizing Lying in Llama"
☆13Apr 24, 2025Updated 11 months ago
Alternatives and similar repositories for llama-lying
Users that are interested in llama-lying are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Oct 31, 2022Updated 3 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 9 months ago
- ☆12Jul 11, 2021Updated 4 years ago
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 2 years ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- The official code for ICML 2024 "FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Recons…☆29Jun 6, 2024Updated last year
- A python sdk for LLM finetuning and inference on runpod infrastructure☆23Mar 16, 2026Updated last week
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Jul 11, 2023Updated 2 years ago
- Code to break Llama Guard☆32Dec 7, 2023Updated 2 years ago
- code of paper "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM"☆14Nov 17, 2023Updated 2 years ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- ☆13Jun 4, 2025Updated 9 months ago
- A .NET Portable Class Library for accessing Bing REST Services.☆11Mar 22, 2014Updated 12 years ago
- Polyglot skipgram embeddings, and their many health benefits☆12Feb 2, 2020Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A blog where I write about research papers and blog posts I read.☆12Nov 20, 2024Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- ☆12Jul 8, 2024Updated last year
- Bash, vim and sundry configuration files☆11Mar 20, 2025Updated last year
- This repo provides a couple of easy-to-use template scripts to help you set up a custom jupyter kernel on a AWS Sagemaker Jupyter Noteboo…☆11Jun 17, 2021Updated 4 years ago
- A machine learning model to recommend movies & tv series☆11Oct 26, 2020Updated 5 years ago
- Downloads Bing, NASA, National Geographic, Unsplash Photo of the Day and sets it as wallpaper☆13Mar 3, 2025Updated last year
- Standalone MSBuild integration of CodeContracts (by Microsoft Research)☆17Sep 11, 2019Updated 6 years ago
- Code Repository for the Paper ---Revisiting the Assumption of Latent Separability for Backdoor Defenses (ICLR 2023)☆47Feb 28, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆13Jun 25, 2021Updated 4 years ago
- Webcrawler to collect cookie data and usage purposes, built using OpenWPM.☆13Jun 3, 2023Updated 2 years ago
- A Visual Question Answering model implemented in MindSpore and PyTorch. The model is a reimplementation of the paper *Show, Ask, Attend, …☆10Jul 27, 2021Updated 4 years ago
- Code associated with the paper "Inducing brain-relevant bias in natural language processing models" in the proceedings of the 33rd Confer…☆13Nov 13, 2019Updated 6 years ago
- Code to reproduce paper "Exploring Longitudinal Effects of Session-based Recommendations" in RecSys 2020☆15Nov 21, 2022Updated 3 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Mar 16, 2026Updated last week
- Applying backdoor attacks to BadNet on MNIST and ResNet on CIFAR10.☆13Aug 25, 2021Updated 4 years ago
- An iOS app to allow you to verify a picture was taken on an iPhone via hardware signing and cryptographic proving. Inspired as the revers…☆12Feb 25, 2024Updated 2 years ago
- Completely remove Gemini’s SynthID security so it can’t detect that your image was made with AI. Simply clone the repository locally, run…☆31Dec 10, 2025Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A method for training neural networks that are provably robust to adversarial attacks. [IJCAI 2019]☆10Sep 3, 2019Updated 6 years ago
- Source code of FedAttack.☆11Feb 9, 2022Updated 4 years ago
- ☆50Oct 24, 2023Updated 2 years ago
- Transcripts for various Youtube Channels inspired by https://karpathy.ai/lexicap/index.html☆15Nov 14, 2025Updated 4 months ago
- ☆11Jan 4, 2023Updated 3 years ago
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- ☆50Aug 30, 2024Updated last year