microsoft / eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
☆89Updated this week
Related projects ⓘ
Alternatives and complementary repositories for eureka-ml-insights
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Automatic Evals for Instruction-Tuned Models☆65Updated this week
- The first dense retrieval model that can be prompted like an LM☆63Updated 2 months ago
- ☆101Updated 3 months ago
- ☆41Updated 2 weeks ago
- ☆68Updated 3 months ago
- ☆40Updated 6 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆127Updated this week
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆148Updated last month
- code for training & evaluating Contextual Document Embedding models☆119Updated this week
- Automating enterprise workflows with multimodal agents☆95Updated last month
- ☆44Updated 6 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆41Updated last month
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆96Updated last month
- Code accompanying "How I learned to start worrying about prompt formatting".☆95Updated last month
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆93Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆128Updated this week
- ☆112Updated last month
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆77Updated 8 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆132Updated this week
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆129Updated this week
- PyTorch implementation for MRL☆18Updated 9 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago