zichuan-liu / IB4LLMs
[NeurIPS'24] Protecting Your LLMs with Information Bottleneck
☆14Updated 5 months ago
Alternatives and similar repositories for IB4LLMs:
Users that are interested in IB4LLMs are comparing it to the libraries listed below
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆26Updated 2 months ago
- JAILJUDGE: A comprehensive evaluation benchmark which includes a wide range of risk scenarios with complex malicious prompts (e.g., synth…☆44Updated 4 months ago
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging☆20Updated 2 months ago
- ☆18Updated 6 months ago
- ☆21Updated 9 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆15Updated 3 months ago
- [ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast☆100Updated last year
- [ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.☆30Updated 2 months ago
- ☆20Updated 3 months ago
- ☆25Updated 2 months ago
- ☆14Updated 3 weeks ago
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆21Updated 11 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆74Updated 2 weeks ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆29Updated 4 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆60Updated 3 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆20Updated 2 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- ☆32Updated 6 months ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆45Updated 2 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆40Updated 5 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year
- ☆67Updated last month
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆32Updated last month
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆49Updated 5 months ago
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆11Updated this week
- [ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents☆80Updated 2 months ago
- ☆22Updated 9 months ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆41Updated 3 months ago
- ☆43Updated 2 months ago