cognizant-ai-lab / neuro-san-benchmarkingLinks
General benchmarking apparatus for running multi-agent systems against benchmarks
☆39Updated 3 weeks ago
Alternatives and similar repositories for neuro-san-benchmarking
Users that are interested in neuro-san-benchmarking are comparing it to the libraries listed below
Sorting:
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆36Updated 3 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 8 months ago
- ☆61Updated 6 months ago
- ☆151Updated 3 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆87Updated 9 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆25Updated 3 weeks ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 8 months ago
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.☆68Updated 6 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆256Updated this week
- ☆105Updated last year
- ☆79Updated 3 months ago
- ☆29Updated 2 months ago
- LIMI: Less is More for Agency☆156Updated 2 months ago
- Codebase from our first release.☆32Updated last week
- ☆15Updated 8 months ago
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37Updated last year
- SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning☆91Updated last month
- ☆86Updated 2 years ago
- Data recipes and robust infrastructure for training AI agents☆77Updated this week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆65Updated last year
- Official code for NeurIPS 2025 paper "AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise"☆116Updated this week
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 11 months ago
- The original Shared Recurrent Memory Transformer implementation☆33Updated 6 months ago
- a Python library that uses Reinforcement Learning (RL) to train LLMs.☆42Updated 5 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆30Updated last year
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆95Updated 2 months ago
- Lego for GRPO☆30Updated 7 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated last year
- ☆149Updated last year
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆101Updated 4 months ago