ethz-spylab / superhuman-ai-consistency
☆28Updated last year
Alternatives and similar repositories for superhuman-ai-consistency:
Users that are interested in superhuman-ai-consistency are comparing it to the libraries listed below
- The repository contains code for Adaptive Data Optimization☆21Updated last month
- ☆26Updated last year
- Minimum Description Length probing for neural network representations☆18Updated last week
- ☆20Updated 3 months ago
- Efficient Scaling laws and collaborative pretraining.☆13Updated 2 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- ☆20Updated 3 months ago
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆16Updated last month
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆26Updated 7 months ago
- Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)☆15Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 10 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆28Updated last year
- ☆16Updated 6 months ago
- ☆17Updated 2 years ago
- ☆21Updated last week
- ☆34Updated last year
- ☆35Updated 2 years ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated 7 months ago
- Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)☆19Updated 8 months ago
- Code for T-MARS data filtering☆35Updated last year
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago
- Official code for the paper: "Metadata Archaeology"☆18Updated last year
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks"☆17Updated last week
- ☆23Updated last month
- Sparse and discrete interpretability tool for neural networks☆58Updated 11 months ago
- Few-shot Learning with Auxiliary Data☆26Updated last year
- ☆14Updated 10 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year