[COLING 2025] Official repo of paper: "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
☆12Jul 26, 2024Updated last year
Alternatives and similar repositories for BabyBLUE-llm
Users that are interested in BabyBLUE-llm are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024 Main] Official repository of paper "SLANG: New Concept Comprehension of Large Language Models"☆14Oct 27, 2024Updated last year
- 🎙️ 一个全自动的学术论文播客生成系统,支持从arXiv网站爬取最新科技资讯,使用LLM生成结构化对话脚本,并通过语音合成技术输出专业的播客音频。集新闻采集、内容生成、语音合成于一体的AI播客工具。☆25Nov 1, 2024Updated last year
- Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"☆27Dec 6, 2024Updated last year
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆21Feb 17, 2025Updated last year
- ☆18Mar 30, 2025Updated 11 months ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models☆33May 21, 2025Updated 9 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆66Aug 25, 2024Updated last year
- ☆40May 17, 2025Updated 9 months ago
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 4 months ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆39Jan 17, 2025Updated last year
- ☆49Apr 4, 2025Updated 11 months ago
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated last year
- ☆19May 14, 2025Updated 9 months ago
- ☆26Sep 3, 2025Updated 6 months ago
- The official implementation of the paper "Large Scale Knowledge Washing"☆10Jun 12, 2024Updated last year
- [NeurIPS 2024 D&B] DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios☆14Nov 19, 2024Updated last year
- A parameter server implement with MPI.☆11Nov 15, 2017Updated 8 years ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- ICML2025: One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework☆14Jun 24, 2025Updated 8 months ago
- ☆19Mar 18, 2025Updated 11 months ago
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Updated this week
- ☆10Jan 15, 2018Updated 8 years ago
- ☆16Nov 18, 2024Updated last year
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- Project of ACL 2025 "UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models"☆14Mar 25, 2025Updated 11 months ago
- [EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners☆19Nov 17, 2025Updated 3 months ago
- Minimal coding, computer-use and deep research agents using the OpenAI Agents SDK☆31Updated this week
- ☆20Jan 5, 2026Updated 2 months ago
- ☆16Feb 17, 2025Updated last year
- Open OnDemand Application Collection for SJTU HPC☆13May 20, 2021Updated 4 years ago
- Official repository for WWW'24 paper "MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation"☆12Jul 25, 2024Updated last year
- Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challenge☆19Mar 1, 2026Updated last week
- enchmarking Large Language Models' Resistance to Malicious Code☆14Dec 1, 2024Updated last year
- code of paper "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM"☆14Nov 17, 2023Updated 2 years ago
- valve source engine hooking on OS X using libembryo, no sdk required☆10Sep 21, 2016Updated 9 years ago
- Create a LangChain ReAct agent with multiple tools (Python REPL and DuckDuckGo Search)☆13Updated this week
- [NDSS'25] The official implementation of safety misalignment.☆17Jan 8, 2025Updated last year
- Implementation of the dataset defined in Spiking Neural Networks for event-based action recognition: A new task to understand their adva…☆15Aug 9, 2023Updated 2 years ago
- ☆15Jun 7, 2024Updated last year