microsoft / eureka-ml-insightsLinks
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
☆168Updated last week
Alternatives and similar repositories for eureka-ml-insights
Users that are interested in eureka-ml-insights are comparing it to the libraries listed below
Sorting:
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆177Updated 5 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆101Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆213Updated 3 weeks ago
- A method for steering llms to better follow instructions☆49Updated 3 weeks ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆73Updated 8 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- code for training & evaluating Contextual Document Embedding models☆197Updated 3 months ago
- ☆77Updated last week
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- The first dense retrieval model that can be prompted like an LM☆83Updated 3 months ago
- Code for ExploreTom☆85Updated 2 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆223Updated this week
- Evaluating LLMs with fewer examples☆160Updated last year
- Official Repo for CRMArena and CRMArena-Pro☆109Updated 2 months ago
- ☆139Updated last week
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆117Updated 3 weeks ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 7 months ago
- ☆145Updated last year
- awesome synthetic (text) datasets☆295Updated last month
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆132Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 6 months ago
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 10 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆100Updated 3 weeks ago
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆148Updated last week
- Banishing LLM Hallucinations Requires Rethinking Generalization☆276Updated last year
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆131Updated last year
- PyTorch library for Active Fine-Tuning☆89Updated 6 months ago
- ☆41Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 8 months ago