Dongping-Chen / MixSet
Official code repository for Mixset.
☆21Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for MixSet
- LLMDet is a text detection tool that can identify which generated sources the text came from (e.g. large language model or human-write).☆50Updated 5 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆60Updated last month
- SeqXGPT: An advance method for sentence-level AI-generated text detection.☆75Updated last year
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆58Updated last month
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆32Updated 2 weeks ago
- Multilingual safety benchmark for Large Language Models☆22Updated 2 months ago
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆11Updated 5 months ago
- ☆35Updated 3 weeks ago
- DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text☆25Updated last year
- Weak-to-Strong Jailbreaking on Large Language Models☆64Updated 8 months ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆59Updated 8 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆69Updated 2 months ago
- [TACL] Code for "Red Teaming Language Model Detectors with Language Models"☆16Updated 11 months ago
- [AAAI 2024] The official repository for our paper, "OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially …☆34Updated 3 weeks ago
- ☆38Updated last year
- Code for the paper: ConDA: Contrastive Domain Adaptation for AI-generated Text Detection☆32Updated 10 months ago
- ☆32Updated 5 months ago
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 3 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆43Updated last year
- ☆33Updated last year
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆21Updated 4 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆23Updated last year
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"☆51Updated 3 months ago
- Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…☆19Updated last year
- Watermarking Text Generated by Black-Box Language Models☆30Updated 11 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆81Updated last month
- Code for the 2024 arXiv publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Mo…☆21Updated 4 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆49Updated 6 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆83Updated 5 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆52Updated 2 weeks ago