SophieZheng998/ALI-Agent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SophieZheng998/ALI-Agent)

SophieZheng998 / ALI-Agent

Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"

☆21

Alternatives and similar repositories for ALI-Agent

Users that are interested in ALI-Agent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SophieZheng998 / RSafe
View on GitHub
Official implementation for "RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards"
☆17Jan 31, 2026Updated 5 months ago
VL-Group / 2022-NeurIPS-DAA
View on GitHub
The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted b…
☆19Jan 16, 2024Updated 2 years ago
yimutianyang / HGSR
View on GitHub
Implement of our TKDE paper: Hyperbolic Graph Learning for Social Recommendation
☆13Jun 3, 2024Updated 2 years ago
junkangwu / Dr_DPO
View on GitHub
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆19Jun 1, 2024Updated 2 years ago
hanshen95 / SEAL
View on GitHub
An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.
☆24Feb 20, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
eltociear / MolCA
View on GitHub
Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".
☆12Dec 27, 2023Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
yimutianyang / KDD24-GBSR
View on GitHub
☆21Dec 26, 2024Updated last year
AkaliKong / PaperClaw
View on GitHub
☆22Mar 11, 2026Updated 4 months ago
tanganke / subspace_fusion
View on GitHub
Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"
☆14Mar 28, 2024Updated 2 years ago
anzhang314 / InvCF
View on GitHub
Official code of "Invariant Collaborative Filtering to Popularity Distribution Shift" (2023 WWW)
☆21Jul 27, 2023Updated 3 years ago
fangjf1 / OpenSafeMLRM
View on GitHub
The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!
☆15Apr 8, 2025Updated last year
yimutianyang / SIGIR23-VGCL
View on GitHub
Tensorflow implementation of our SIGIR 2023 accepted paper "Generative-Contrastive Graph Learning for Recommendation"
☆32Aug 26, 2024Updated last year
acharkq / MolCA
View on GitHub
Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".
☆85Feb 25, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
chenyuxin1999 / S-DPO
View on GitHub
[NeurIPS 2024] The implementation of paper "On Softmax Direct Preference Optimization for Recommendation"
☆101Nov 29, 2024Updated last year
zjunlp / LookAheadTuning
View on GitHub
[WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews
☆17Dec 14, 2025Updated 7 months ago
AlphaLab-USTC / Must-Read-LLM-Papers
View on GitHub
☆19Sep 16, 2025Updated 10 months ago
SchwinnL / circuit-breakers-eval
View on GitHub
Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting
☆18Apr 15, 2025Updated last year
syr-cn / ReLM
View on GitHub
[EMNLP 2023] ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction.
☆22Jan 28, 2024Updated 2 years ago
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
YangZhengyi98 / RecInterpreter
View on GitHub
☆25Nov 16, 2023Updated 2 years ago
noahcao / disentanglement_lib_med
View on GitHub
[NeurIPS 2022] disentanglement evaluation robust to model dimension variance.
☆10Sep 21, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ethz-spylab / jailbreak-tax
View on GitHub
☆24Feb 17, 2026Updated 5 months ago
real-absolute-AI / SynthRL
View on GitHub
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
☆70Jul 24, 2025Updated last year
AlphaLab-USTC / AutoWiki-skill
View on GitHub
☆61Apr 9, 2026Updated 3 months ago
TMoneyBidness / CSV_Agent_with_Prompts
View on GitHub
This is an agent (including contextual prompts) that queries your CSV
☆10Jun 8, 2023Updated 3 years ago
syr-cn / ReactXT
View on GitHub
[ACL 2024] ReactXT: Understanding Molecular “Reaction-ship” via Reaction-Contextualized Molecule-Text Pretraining. by Zhiyuan Liu*, Yaoru…
☆30Sep 3, 2024Updated last year
CERT-Lab / fed-sb
View on GitHub
(TMLR J2C Certification) Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tu…
☆27Oct 4, 2025Updated 9 months ago
junkangwu / QAE
View on GitHub
[ICLR 2026] Quantile Advantage Estimation for Entropy-Safe Reasoning
☆29Oct 14, 2025Updated 9 months ago
Unispac / shallow-vs-deep-alignment
View on GitHub
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆190Apr 23, 2025Updated last year
yimutianyang / SIGIR2021-EGLN
View on GitHub
The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"
☆18Aug 11, 2021Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
homles11 / SaLoRA
View on GitHub
Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”
☆29Oct 23, 2025Updated 9 months ago
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
rmin2000 / adv_tracing
View on GitHub
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Jul 15, 2024Updated 2 years ago
ASTRAL-Group / ASTRA
View on GitHub
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆62Jul 5, 2025Updated last year
git-disl / Virus
View on GitHub
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
☆56Feb 2, 2025Updated last year
Arthurma71 / AdvDrop
View on GitHub
☆11Mar 8, 2024Updated 2 years ago
IBM / NeuralFuse
View on GitHub
[NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes" by Hao-Lun …
☆10Sep 18, 2025Updated 10 months ago