SalesforceAIResearch / indict_code_genLinks
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
☆13Updated last month
Alternatives and similar repositories for indict_code_gen
Users that are interested in indict_code_gen are comparing it to the libraries listed below
Sorting:
- Codebase for Inference-Time Policy Adapters☆24Updated 2 years ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆85Updated last year
- Training and Benchmarking LLMs for Code Preference.☆37Updated last year
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆176Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆123Updated last year
- ☆41Updated 7 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆98Updated last year
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆71Updated last year
- Bayesian scaling laws for in-context learning.☆15Updated 9 months ago
- Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"☆36Updated 5 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆30Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- ☆30Updated this week
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆125Updated last year
- ☆31Updated 2 years ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆55Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆117Updated 10 months ago
- ☆59Updated 2 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆158Updated 6 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆64Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆84Updated last year
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆77Updated 11 months ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆66Updated last year
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆48Updated last year
- ☆12Updated last year
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆63Updated last year
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆163Updated last year
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions☆25Updated last year
- ☆107Updated last year
- ☆20Updated 6 months ago