microsoft / eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
☆106Updated this week
Alternatives and similar repositories for eureka-ml-insights:
Users that are interested in eureka-ml-insights are comparing it to the libraries listed below
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆157Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 5 months ago
- ☆32Updated last week
- Functional Benchmarks and the Reasoning Gap☆82Updated 4 months ago
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆44Updated last month
- ☆117Updated 4 months ago
- ☆32Updated 7 months ago
- ☆141Updated 7 months ago
- Code for ExploreTom☆75Updated 2 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆102Updated 4 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆100Updated 5 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆162Updated last week
- The first dense retrieval model that can be prompted like an LM☆64Updated 5 months ago
- ☆40Updated 9 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆161Updated last week
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆43Updated last week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆48Updated 2 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆118Updated 6 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆197Updated 4 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆126Updated this week
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆175Updated this week
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆98Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 5 months ago
- ☆78Updated last month
- ☆164Updated last month
- ☆73Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆145Updated 2 months ago