LINs-lab / MASArenaLinks
A comprehensive framework for benchmarking single and multi-agent systems across a wide range of tasks—evaluating performance, accuracy, and efficiency with built-in visualization and tool integration.
☆20Updated this week
Alternatives and similar repositories for MASArena
Users that are interested in MASArena are comparing it to the libraries listed below
Sorting:
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆42Updated last month
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆140Updated 2 weeks ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆44Updated 5 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆210Updated this week
- A collection of papers on discrete diffusion models☆145Updated 2 weeks ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆54Updated 2 months ago
- ☆65Updated 2 months ago
- The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usa…☆28Updated 3 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆75Updated 4 months ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆26Updated this week
- ☆139Updated last month
- [arXiv 2025] Efficient Reasoning Models: A Survey☆184Updated this week
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆134Updated 2 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆123Updated this week
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆65Updated this week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated 3 months ago
- ☆47Updated 3 weeks ago
- 🔥 How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasonin…☆52Updated last month
- Reproducing R1 for Code with Reliable Rewards☆222Updated last month
- ☆227Updated last week
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆40Updated 7 months ago
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …☆52Updated 3 weeks ago
- [arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆32Updated last month
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆75Updated this week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆73Updated last week
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆28Updated 3 months ago
- Paper List of Inference/Test Time Scaling/Computing☆264Updated last week
- ☆20Updated last month
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆20Updated 3 months ago
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆55Updated 11 months ago