Farama-Foundation / A2PerfLinks
A2Perf is a benchmark for evaluating agents on sequential decision problems that are relevant to the real world. This repository contains code for running and evaluating participant's submissions on the benchmark platform.
☆10Updated 11 months ago
Alternatives and similar repositories for A2Perf
Users that are interested in A2Perf are comparing it to the libraries listed below
Sorting:
- ALife simulation with Python: patterns, behavior, and cognition.☆21Updated 5 years ago
- A template gymnasium environment for users to build upon☆21Updated 10 months ago
- The AI-PMS Microservice uses AI to predict aircraft system failures before they occur, optimizing maintenance and enhancing safety. This …☆11Updated 9 months ago
- IdeaSpark is your go-to open-source AI prompting tool, designed to ignite your creativity. Discover, create, and share captivating prompt…☆15Updated last month
- The AgentForge project focuses on building general tooling to construct multicapability AI systems by composing skills and models togethe…☆17Updated last year
- Causal Analysis of Agent Behavior for AI Safety☆18Updated 2 years ago
- Cliff walking reinforcement learning example, with a variety of RL algorithms☆13Updated last year
- A forest of autonomous agents.☆19Updated 7 months ago
- Clean RL implementation using MLX☆32Updated last year
- ☆15Updated last year
- Everything for the Paper: 'Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing'☆17Updated last year
- examples and guides to using Nomic Atlas☆39Updated 4 months ago
- Work in progress! I don't recommend looking at the code right now.☆24Updated 2 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆15Updated 11 months ago
- A swarm of LLM agents that will help you test, document, and productionize your code!☆18Updated last week
- Awesome Orchest projects, both official and submitted by the community.☆25Updated 2 years ago
- CogNetX is an advanced, multimodal neural network architecture inspired by human cognition. It integrates speech, vision, and video proce…☆16Updated 2 weeks ago
- Interactive Textbook Demo☆45Updated last year
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆44Updated this week
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 9 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆18Updated last year
- Produce intelligence by means of natural selection without objective/reward optimization☆14Updated 3 years ago
- The APAC AI Hub for Documents, Product Briefs, Plans, and SOPS, we're currently raising SAFE 100M$ at a 1Billion$ valuation.☆10Updated last year
- Deploy your autonomous agents to production grade environments with 99% Uptime Guarantee, Infinite Scalability, and self-healing.☆45Updated last week
- Official repository of the 2025 paper, LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra.☆40Updated last month
- Amplify your coding capabilities with AI - your smart co-pilot for an elevated coding experience.☆13Updated 6 months ago
- From Dataset Labeling, Entity Extraction to production Knowledge Graph Deployment: The Power of NLP and LLMs Combined.☆12Updated last year
- On-the-fly conversions between Jax and NumPy tensors☆52Updated 2 years ago
- Mangrove is the backend module of Estuary, a framework for building multimodal real-time Socially Intelligent Agents (SIAs).☆13Updated last month
- Open-source, knowledge-grounded conversational assistant☆13Updated 2 months ago