microsoft / eureka-ml-insightsLinks

A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.

☆165

Alternatives and similar repositories for eureka-ml-insights

Users that are interested in eureka-ml-insights are comparing it to the libraries listed below

Sorting:

facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆99Updated 3 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 5 months ago
microsoft / llm-steer-instruct
A method for steering llms to better follow instructions
☆48Updated 3 weeks ago
allenai / infinigram-api
☆73Updated 3 weeks ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆81Updated this week
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆160Updated last year
google-deepmind / mishax
☆136Updated 4 months ago
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆209Updated this week
apple / ml-superposition-prompting
☆145Updated last year
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆196Updated 2 months ago
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆131Updated last year
huggingface / gpt-oss-recipes
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
☆222Updated this week
facebookresearch / ExploreToM
Code for ExploreTom
☆84Updated last month
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆91Updated 2 months ago
HishamAlyahya / semantic_backprop
Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖
☆72Updated 8 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
tianyang-x / SaySelf
Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"
☆108Updated 10 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 5 months ago
SalesforceAIResearch / CRMArena
Official Repo for CRMArena and CRMArena-Pro
☆104Updated last month
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆218Updated this week
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆115Updated 10 months ago
ibm-granite / granite-3.0-language-models
☆261Updated last month
PrimeIntellect-ai / genesys
☆130Updated 4 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆276Updated last year
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆119Updated 6 months ago
allenai / DataDecide
☆31Updated last week