mrconter1 / BenchmarkAggregatorLinks
Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.
☆16Updated 9 months ago
Alternatives and similar repositories for BenchmarkAggregator
Users that are interested in BenchmarkAggregator are comparing it to the libraries listed below
Sorting:
- an auto-sleeping and -waking framework around llama.cpp☆12Updated 3 months ago
- One Line To Build Zero-Data Classifiers in Minutes☆55Updated 8 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆33Updated last month
- entropix style sampling + GUI☆26Updated 7 months ago
- ☆41Updated 5 months ago
- ☆20Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆52Updated 2 months ago
- ☆50Updated last week
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆36Updated last month
- ☆48Updated this week
- ☆28Updated 9 months ago
- Easiest way to build custom agents, in a no-code notion style editor, using simple macros.☆27Updated 6 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆61Updated 9 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆26Updated 2 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 4 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 6 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 6 months ago
- Reasoning by Communicating with Agents☆28Updated last month
- ☆24Updated 8 months ago
- OpenPipe Reinforcement Learning Experiments☆25Updated 2 months ago
- Advanced Coding AI Assistant that uses a Gradio interface to stream coding related responses. ChatRAG supports local and API inference an…☆22Updated last month
- ☆24Updated 4 months ago
- ☆65Updated 2 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated last month
- A collection of pre-build wrappers over common RAG systems like ChromaDB, Weaviate, Pinecone, and othersz!☆34Updated last week
- ☆19Updated this week
- Web Interface for Vision Language Models Including InternVLM2☆22Updated 10 months ago
- ☆21Updated 5 months ago
- ☆29Updated last month
- The Swarm Ecosystem☆21Updated 10 months ago