microsoft / eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
☆28Updated this week
Related projects: ⓘ
- Official implementation of Goldfish Loss: Mitigating Memorization in Generative LLMs☆68Updated 2 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated 11 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 8 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆19Updated 9 months ago
- ☆36Updated last month
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆82Updated 2 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆76Updated 6 months ago
- A mechanistic approach for understanding and detecting factual errors of large language models.☆38Updated 2 months ago
- ☆68Updated last month
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆56Updated last month
- Explain a black-box module in natural language.☆33Updated 3 weeks ago
- We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts…☆91Updated last month
- ☆29Updated 2 weeks ago
- Research on Tabular Foundation Models☆21Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆39Updated 2 weeks ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆55Updated last week
- PASTA: Post-hoc Attention Steering for LLMs☆96Updated last week
- SILO Language Models code repository☆80Updated 6 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 2 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆72Updated 4 months ago
- ☆57Updated last week
- ☆136Updated 7 months ago
- [NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind☆67Updated 8 months ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆56Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- ☆116Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆70Updated last month
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆37Updated 5 months ago
- ☆27Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago