RobustNLP / TestNER
A toolkit for testing and improving named entity recognition [ESEC/FSE'23]
☆11Updated last year
Alternatives and similar repositories for TestNER:
Users that are interested in TestNER are comparing it to the libraries listed below
- ☆11Updated 2 months ago
- [ICSE'25] Aligning the Objective of LLM-based Program Repair☆14Updated 3 weeks ago
- A toolkit for testing machine translation [ICSE'20, '21, ESEC/FSE'20]☆33Updated 3 years ago
- An Evolving Code Generation Benchmark Aligned with Real-world Code Repositories☆54Updated 7 months ago
- ☆18Updated 4 months ago
- A lightweight tool for detecting bugs on Graph Database Management Systems☆14Updated last year
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆68Updated 8 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆58Updated 2 months ago
- [TOSEM 2023] A Survey of Learning-based Automated Program Repair☆69Updated 10 months ago
- MTTM: Metamorphic Testing for Textual Content Moderation Software☆32Updated 2 years ago
- [LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization☆36Updated 3 weeks ago
- ☆11Updated last year
- For our ICSE23 paper "Impact of Code Language Models on Automated Program Repair" by Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan☆59Updated 5 months ago
- EvoEval: Evolving Coding Benchmarks via LLM☆68Updated 11 months ago
- Structure-Invariant Testing for Machine Translation [ICSE'20]☆16Updated 4 years ago
- Multilingual safety benchmark for Large Language Models☆49Updated 6 months ago
- ☆36Updated 2 months ago
- ☆25Updated 2 years ago
- ☆23Updated 11 months ago
- ☆14Updated last year
- Source Code for Paper "Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning"☆16Updated last year
- ☆34Updated 8 months ago
- [EMNLP 2024] CodeJudge: Evaluating Code Generation with Large Language Models☆38Updated 3 months ago
- A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories☆20Updated 6 months ago
- Benchmark ClassEval for class-level code generation.☆136Updated 5 months ago
- [ICLR 2024]: Is Self-Repair a Silver Bullet for Code Generation?☆13Updated 10 months ago
- Replication Package for "Natural Attack for Pre-trained Models of Code", ICSE 2022☆46Updated 7 months ago
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆14Updated this week
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆31Updated 3 weeks ago
- ☆27Updated 2 years ago