Social-AI-Studio / ToxiCloakCNLinks
Official repository for EMNLP'24 paper "ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations"
☆42Updated 8 months ago
Alternatives and similar repositories for ToxiCloakCN
Users that are interested in ToxiCloakCN are comparing it to the libraries listed below
Sorting:
- The code and resource of "Towards Comprehensive Detection of Chinese Harmful Memes" (NeurIPS2024 D&B).☆45Updated last month
- The code and resource of "Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmark"…☆79Updated last month
- [ICLR 2025] Released code for paper "Spurious Forgetting in Continual Learning of Language Models"☆47Updated last month
- ☆20Updated 3 weeks ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆126Updated 9 months ago
- ☆17Updated 3 months ago
- ☆27Updated 2 years ago
- 大模型进阶面经☆53Updated last month
- [ICML2024] Adaptive Text Watermark for Large Language Models☆21Updated 6 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆29Updated last year
- ☆33Updated 8 months ago
- ☆82Updated last year
- ☆44Updated last year
- ☆49Updated last year
- Awesome-Large-Search-Models is a collection of papers and resources (Methods, Datasets and other resources) about search-oriented large r…☆110Updated last week
- A collection of survey papers and resources related to Large Language Models (LLMs).☆40Updated last year
- Code for paper 'Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning'☆16Updated last year
- Official Repository for "Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Samplin…☆22Updated 10 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆64Updated this week
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆24Updated 11 months ago
- ☆37Updated this week
- ☆22Updated 11 months ago
- Constraint Back-translation Improves Complex Instruction Following of Large Language Models☆13Updated last month
- ☆28Updated 11 months ago
- ☆21Updated 3 months ago
- The code and data of DPA-RAG, accepted by WWW 2025 main conference.☆61Updated 5 months ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆44Updated 5 months ago
- ☆84Updated last year
- [ACL 2024] Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models☆38Updated last year
- ☆24Updated 2 years ago