mrconter1 / BenchmarkAggregatorLinks

Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.

☆16

Alternatives and similar repositories for BenchmarkAggregator

Users that are interested in BenchmarkAggregator are comparing it to the libraries listed below

Sorting:

FarFetchd / sleepyllama
an auto-sleeping and -waking framework around llama.cpp
☆12Updated 3 months ago
kenhktsui / anyclassifier
One Line To Build Zero-Data Classifiers in Minutes
☆55Updated 8 months ago
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆33Updated last month
EduardTalianu / EntropixLab
entropix style sampling + GUI
☆26Updated 7 months ago
miralab-ai / autoreason
☆41Updated 5 months ago
iulia-b10 / multilingual-embedding-models
☆20Updated last year
egozverev / Should-It-Be-Executed-Or-Processed
Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.
☆52Updated 2 months ago
yueqis / API-Based-Agent
☆50Updated last week
lechmazur / pgg_bench
Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…
☆36Updated last month
sambanova / agents
☆48Updated this week
slashml / awesome-finetuning
☆28Updated 9 months ago
NaturalAgents / NaturalAgents
Easiest way to build custom agents, in a no-code notion style editor, using simple macros.
☆27Updated 6 months ago
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆61Updated 9 months ago
axolotl-ai-cloud / grpo_code
A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.
☆26Updated 2 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆90Updated 4 months ago
kyegomez / Exa
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…
☆26Updated 6 months ago
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated 6 months ago
allenai / recoma
Reasoning by Communicating with Agents
☆28Updated last month
dinobby / MAgICoRE
☆24Updated 8 months ago
OpenPipe / rl-experiments
OpenPipe Reinforcement Learning Experiments
☆25Updated 2 months ago
JakeFurtaw / Chat-RAG
Advanced Coding AI Assistant that uses a Gradio interface to stream coding related responses. ChatRAG supports local and API inference an…
☆22Updated last month
lightblue-tech / lb-reranker
☆24Updated 4 months ago
du-nlp-lab / MLR-Copilot
☆65Updated 2 months ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated last month
The-Swarm-Corporation / swarms-memory
A collection of pre-build wrappers over common RAG systems like ChromaDB, Weaviate, Pinecone, and othersz!
☆34Updated last week
OpenMOSS / Lorsa
☆19Updated this week
sammcj / vlm-ui
Web Interface for Vision Language Models Including InternVLM2
☆22Updated 10 months ago
rohinmanvi / Capability-Aware_and_Mid-Generation_Self-Evaluations
☆21Updated 5 months ago
EQ-bench / creative-writing-bench
☆29Updated last month
The-Swarm-Corporation / swarm-ecosystem
The Swarm Ecosystem
☆21Updated 10 months ago