cognizant-ai-lab / neuro-san-benchmarkingLinks
General benchmarking apparatus for running multi-agent systems against benchmarks
☆34Updated last week
Alternatives and similar repositories for neuro-san-benchmarking
Users that are interested in neuro-san-benchmarking are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆84Updated 9 months ago
- ☆101Updated last week
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Updated this week
- accompanying material for sleep-time compute paper☆118Updated 7 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆66Updated last year
- SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning☆89Updated last month
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.☆67Updated 5 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆93Updated last month
- ☆15Updated 7 months ago
- Official code repository for Sketch-of-Thought (SoT)☆128Updated 7 months ago
- ☆63Updated 5 months ago
- ☆202Updated 2 weeks ago
- Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization☆36Updated 3 weeks ago
- ☆98Updated this week
- ☆102Updated last year
- ☆79Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 9 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 11 months ago
- ☆68Updated 6 months ago
- Code for the paper: Generative Interfaces for Language Models☆104Updated 2 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 7 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆81Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆30Updated last year
- The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"☆293Updated this week
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆144Updated 3 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆91Updated 6 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆87Updated 2 weeks ago
- ☆84Updated last year
- look how they massacred my boy☆63Updated last year
- ☆35Updated 7 months ago