microsoft / automated-brain-explanations
Generating and validating natural-language explanations for the brain.
☆52Updated last month
Alternatives and similar repositories for automated-brain-explanations
Users that are interested in automated-brain-explanations are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Lea…☆74Updated last year
- Repository for the paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?"☆24Updated 2 weeks ago
- We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts…☆94Updated 9 months ago
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 2 years ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- ☆29Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆44Updated last year
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- Minimum Description Length probing for neural network representations☆19Updated 3 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆15Updated 8 months ago
- Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024☆20Updated 10 months ago
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆42Updated 2 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Code and data from the paper 'Human Feedback is not Gold Standard'☆19Updated 10 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆61Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆17Updated 10 months ago
- Tasks for describing differences between text distributions.☆16Updated 9 months ago
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- ☆25Updated 2 years ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆13Updated 5 months ago
- This is official project in our paper: Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers☆30Updated last year
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).☆25Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixing☆25Updated 3 months ago
- SILO Language Models code repository☆81Updated last year
- ☆11Updated last year
- ☆48Updated 6 months ago
- ☆25Updated last year