GitsSaikat / Guardian-AgentLinks
Improving AI Systems with Self-Defense Mechanisms
☆19Updated 6 months ago
Alternatives and similar repositories for Guardian-Agent
Users that are interested in Guardian-Agent are comparing it to the libraries listed below
Sorting:
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆31Updated 2 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆77Updated 6 months ago
- ☆27Updated 3 months ago
- ☆56Updated 2 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆96Updated 2 weeks ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 5 months ago
- ☆19Updated 6 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 4 months ago
- ☆34Updated last month
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.☆58Updated 2 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 6 months ago
- Resa: Transparent Reasoning Models via SAEs☆41Updated last month
- The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"☆147Updated this week
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆150Updated 3 weeks ago
- ☆16Updated last month
- Official repo of paper LM2☆43Updated 7 months ago
- ☆23Updated 11 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆120Updated 7 months ago
- ☆67Updated 5 months ago
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆34Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 7 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆105Updated 3 months ago
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆36Updated last week
- accompanying material for sleep-time compute paper☆111Updated 4 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆50Updated 5 months ago
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆41Updated 4 months ago
- ☆54Updated 10 months ago
- ☆68Updated 3 months ago
- The original Shared Recurrent Memory Transformer implementation☆31Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆95Updated this week