Zhongping-Zhang / MGT_LocalizationLinks
Implementation for Machine-Generated Text Localization (ACL 2024 Findings)
☆15Updated last year
Alternatives and similar repositories for MGT_Localization
Users that are interested in MGT_Localization are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024 D&B] DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios☆45Updated last year
- The lastest paper about detection of LLM-generated text and code☆284Updated 7 months ago
- A survey on harmful fine-tuning attack for large language model☆232Updated last month
- ☆25Updated 6 months ago
- Code for ACL 2024 paper: PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models.☆16Updated last year
- Accepted by ECCV 2024☆185Updated last year
- ☆158Updated 2 years ago
- [ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks☆14Updated 2 years ago
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆30Updated 6 months ago
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆36Updated 10 months ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Updated 3 weeks ago
- Code base for ICLR 2025 "Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection"☆54Updated 5 months ago
- ☆27Updated last month
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆53Updated 6 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆79Updated last year
- ☆17Updated 9 months ago
- Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”☆24Updated 3 months ago
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆79Updated last year
- Implementation of the paper Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs☆13Updated 8 months ago
- [ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.☆125Updated 5 months ago
- ☆33Updated 10 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆84Updated last year
- [AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple i…☆14Updated 9 months ago
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature☆450Updated 2 years ago
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆47Updated 3 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆173Updated 9 months ago
- A curated list of resources for activation engineering☆123Updated 4 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆89Updated 10 months ago
- awesome SAE papers☆71Updated 8 months ago
- The reinforcement learning codes for dataset SPA-VL☆44Updated last year