☆92Jan 30, 2026Updated last month
Alternatives and similar repositories for baxbench
Users that are interested in baxbench are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆73Mar 13, 2026Updated last week
- ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.☆37Jul 20, 2025Updated 8 months ago
- ☆13Sep 23, 2022Updated 3 years ago
- ☆51Jul 16, 2024Updated last year
- Guardrails for secure and robust agent development☆399Jan 12, 2026Updated 2 months ago
- Synthesized models for PHOG to make the results reproducible by the research community☆11Jan 23, 2020Updated 6 years ago
- Certifying Geometric Robustness of Neural Networks☆16Mar 24, 2023Updated 2 years ago
- ☆72Nov 7, 2025Updated 4 months ago
- SRI Group Website☆10Updated this week
- ☆13Jun 24, 2025Updated 8 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Mar 12, 2024Updated 2 years ago
- A bash script that turns a version-controlled paper into a cool timelapse.☆13Mar 21, 2013Updated 12 years ago
- ☆21May 23, 2025Updated 9 months ago
- Generating Adversarial Examples for Holding Robustness of Source Code Processing Models☆15Dec 2, 2021Updated 4 years ago
- 🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025☆36Aug 24, 2025Updated 6 months ago
- ☆21Aug 30, 2022Updated 3 years ago
- A Synthetic Dataset for Personal Attribute Inference (NeurIPS'24 D&B)☆53Jul 27, 2025Updated 7 months ago
- G'n'T Eval is an evaluation suite that allows to carry out pen and paper evaluations. It ships with all necessary tools, i.e. management …☆14Nov 2, 2013Updated 12 years ago
- Notes and insights about OpenAI's Code Interpreter☆13Jul 26, 2023Updated 2 years ago
- This JavaScript CLI "undeletes' packages that have been removed from the NPM registry☆29Updated this week
- Latent Space Smoothing for Individually Fair Representations (ECCV 2022)☆15Nov 4, 2022Updated 3 years ago
- [NeurIPS 2019] H. Chen*, H. Zhang*, S. Si, Y. Li, D. Boning and C.-J. Hsieh, Robustness Verification of Tree-based Models (*equal contrib…☆27Jun 15, 2019Updated 6 years ago
- A benchmark dataset for evaluating LLM's SVG editing capabilities☆36Oct 17, 2024Updated last year
- Sample CloudFormation template to create spot fleet request☆11Mar 23, 2016Updated 9 years ago
- ☆12Jul 8, 2023Updated 2 years ago
- Enhacing Code Pre-trained Models by Contrastive Learning☆38Mar 8, 2023Updated 3 years ago
- ☆72Feb 16, 2025Updated last year
- ☆20Apr 10, 2025Updated 11 months ago
- This is the replication package of V-SZZ, which has been accepted by ICSE2022☆16Jan 19, 2026Updated 2 months ago
- Efficient non-uniform quantization with GPTQ for GGUF☆63Sep 17, 2025Updated 6 months ago
- Private and Reliable Neural Network Inference (CCS '22)☆22Jul 11, 2023Updated 2 years ago
- An experiment with AWS SpotFleets and ECS☆16Oct 12, 2022Updated 3 years ago
- ☆93Mar 6, 2026Updated 2 weeks ago
- A low-cost approach to testing AI chat experiences and security concepts☆40Jul 23, 2025Updated 7 months ago
- Infrastructure-as-code for a serverless knowledge base using Amazon Bedrock, Aurora PostgreSQL (with pgvector), Lambda, and S3. This setu…☆19Mar 23, 2025Updated 11 months ago
- Code for the paper "Firewalls to Secure Dynamic LLM Agentic Networks"☆29Jun 6, 2025Updated 9 months ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- Symbolic (analytical) polyhedron projection by Fourier-Motzkin elimination using SymPy☆11Oct 17, 2019Updated 6 years ago
- Clover: Closed-Loop Verifiable Code Generation☆45May 12, 2025Updated 10 months ago