mrconter1 / BenchmarkAggregatorLinks
Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.
☆16Updated last year
Alternatives and similar repositories for BenchmarkAggregator
Users that are interested in BenchmarkAggregator are comparing it to the libraries listed below
Sorting:
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated last year
- A Python library to orchestrate LLMs in a neural network-inspired structure☆52Updated last year
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆24Updated last year
- Simple examples using Argilla tools to build AI☆57Updated last year
- Pivotal Token Search☆144Updated last month
- ☆37Updated 6 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 3 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆56Updated 11 months ago
- entropix style sampling + GUI☆27Updated last year
- CLI that uses DSPy to interact with MCP servers.☆24Updated 11 months ago
- ☆107Updated 3 months ago
- II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) dataset☆31Updated 10 months ago
- Very minimal (and stateless) agent framework☆44Updated last year
- ☆17Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 9 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆58Updated 10 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated last year
- OpenAI GPT hosted Agent Framework for Windows and MacOS☆36Updated last year
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆90Updated last month
- powerful and fast tool calling agents☆80Updated 10 months ago
- Embed anything.☆27Updated last year
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated last year
- One Line To Build Zero-Data Classifiers in Minutes☆63Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- an auto-sleeping and -waking framework around llama.cpp☆12Updated last year
- CursorCore: Assist Programming through Aligning Anything☆133Updated 11 months ago
- Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and…☆50Updated 8 months ago
- ☆30Updated last year