logic-star-ai / baxbenchLinks
☆33Updated 3 months ago
Alternatives and similar repositories for baxbench
Users that are interested in baxbench are comparing it to the libraries listed below
Sorting:
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆175Updated this week
- ☆44Updated 10 months ago
- ☆114Updated 10 months ago
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆49Updated last month
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆71Updated last year
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆49Updated 2 months ago
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆14Updated 2 months ago
- ☆43Updated 8 months ago
- ☆109Updated 2 weeks ago
- Code for the AAAI 2023 paper "CodeAttack: Code-based Adversarial Attacks for Pre-Trained Programming Language Models☆30Updated 2 years ago
- This repository contains the replication package of our paper "Assessing the Security of GitHub Copilot’s Generated Code - A Targeted Rep…☆10Updated last year
- ☆39Updated 7 months ago
- Code to break Llama Guard☆31Updated last year
- EvoEval: Evolving Coding Benchmarks via LLM☆73Updated last year
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆69Updated last year
- Dataset for the Tensor Trust project☆40Updated last year
- CodeGuard+: Constrained Decoding for Secure Code Generation☆11Updated 10 months ago
- Code to generate NeuralExecs (prompt injection for LLMs)☆22Updated 6 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆49Updated 7 months ago
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆53Updated 10 months ago
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆17Updated 3 weeks ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆110Updated last year
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆79Updated 6 months ago
- Simultaneous evaluation on both functionality and security of LLM-generated code.☆19Updated 4 months ago
- An autonomous LLM-agent for large-scale, repository-level code auditing☆52Updated last month
- CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot☆11Updated last year
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆35Updated last month
- Repository for PrimeVul Vulnerability Detection Dataset☆146Updated 9 months ago
- ☆25Updated 8 months ago
- ☆20Updated last year