SalesforceAIResearch / indict_code_gen
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
☆13Updated 4 months ago
Alternatives and similar repositories for indict_code_gen
Users that are interested in indict_code_gen are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆78Updated 6 months ago
- ☆18Updated 6 months ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 6 months ago
- Codebase for Inference-Time Policy Adapters☆23Updated last year
- ☆75Updated last month
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆68Updated last year
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆45Updated last month
- ☆54Updated 2 years ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆55Updated 2 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆87Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆93Updated 11 months ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆19Updated last year
- ☆24Updated 6 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 11 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆61Updated 4 months ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆16Updated last month
- ☆99Updated last week
- Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"☆45Updated 4 months ago
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆31Updated 10 months ago
- ☆23Updated 7 months ago
- ☆18Updated 10 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆82Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 9 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated last year
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆60Updated 7 months ago
- ☆18Updated last year
- ☆32Updated last week
- ☆31Updated last year
- ☆26Updated 4 months ago