mrconter1 / BenchmarkAggregatorLinks
Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.
☆16Updated last year
Alternatives and similar repositories for BenchmarkAggregator
Users that are interested in BenchmarkAggregator are comparing it to the libraries listed below
Sorting:
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated 7 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆50Updated 11 months ago
- ☆104Updated 2 months ago
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated 8 months ago
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆106Updated last month
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆54Updated 4 months ago
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆38Updated 4 months ago
- ☆30Updated last month
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 5 months ago
- ☆17Updated 8 months ago
- Local LLM inference & management server with built-in OpenAI API☆31Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆64Updated last year
- One Line To Build Zero-Data Classifiers in Minutes☆58Updated 11 months ago
- an auto-sleeping and -waking framework around llama.cpp☆12Updated 6 months ago
- ☆57Updated 6 months ago
- ☆40Updated 8 months ago
- Pivotal Token Search☆123Updated last month
- CLI that uses DSPy to interact with MCP servers.☆23Updated 5 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆85Updated 5 months ago
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆68Updated last year
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆34Updated this week
- Simple examples using Argilla tools to build AI☆55Updated 9 months ago
- Embed anything.☆28Updated last year
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆56Updated 6 months ago
- Very minimal (and stateless) agent framework☆45Updated 7 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 9 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆76Updated 8 months ago
- A mcp server that uses the Osmosis-Apply-1.7B model to apply code merges☆51Updated 2 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆29Updated 8 months ago
- ☆60Updated last month