Official Repo of Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
☆65Oct 28, 2025Updated 4 months ago
Alternatives and similar repositories for Misevolution
Users that are interested in Misevolution are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- Project of ACL 2025 "UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models"☆14Mar 25, 2025Updated 11 months ago
- [NDSS'25] The official implementation of safety misalignment.☆17Jan 8, 2025Updated last year
- ☆23Oct 30, 2025Updated 4 months ago
- ☆19Jun 21, 2025Updated 8 months ago
- ☆44Oct 1, 2024Updated last year
- ☆24Dec 8, 2024Updated last year
- Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".☆36Jul 10, 2025Updated 7 months ago
- ☆30May 22, 2024Updated last year
- [AAAI 2026] Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks☆40Nov 24, 2025Updated 3 months ago
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆54Apr 6, 2025Updated 11 months ago
- CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics☆27Nov 1, 2025Updated 4 months ago
- The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …☆59Sep 11, 2025Updated 5 months ago
- This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL☆61Sep 5, 2025Updated 6 months ago
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- [KDD'23] This is the code repo for our KDD'23 paper "DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling".☆11Jun 14, 2023Updated 2 years ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- ☆24Feb 18, 2026Updated 2 weeks ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- A lightweight, high-performance deep learning inference framework built in Rust. Zen-Infer provides a clean, modular architecture for dep…☆20Jul 31, 2025Updated 7 months ago
- ☆11Oct 25, 2024Updated last year
- An AlphaZero engine for Saiblo Connect4, featuring a pure Python implementation of key KataGo techniques.☆15Feb 26, 2026Updated last week
- ☆19May 14, 2025Updated 9 months ago
- ☆14Feb 26, 2025Updated last year
- ☆18May 3, 2025Updated 10 months ago
- Data Science Challenge☆11May 14, 2021Updated 4 years ago
- 一个学院网络规划与设计方案(同济大学20计科计算机网络课程设计)☆13Sep 13, 2023Updated 2 years ago
- [DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs☆28Nov 13, 2025Updated 3 months ago
- PostgreSQL SKILLs for AI Agent☆26Feb 5, 2026Updated last month
- ICML2025: One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework☆14Jun 24, 2025Updated 8 months ago
- PyTorch implementation for our paper "Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation"☆13Apr 19, 2023Updated 2 years ago
- The officalimplement of dLLM-Factory☆26Jul 12, 2025Updated 7 months ago
- MCP server for Fluent (ServiceNow SDK)☆19Feb 9, 2026Updated 3 weeks ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- 百度地图坐标拾取工具☆12Jan 27, 2018Updated 8 years ago
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…☆10Feb 7, 2026Updated last month
- ☆23May 26, 2025Updated 9 months ago