facebookresearch / pal
PAL: Predictive Analysis & Laws of Large Language Models
☆35Updated 4 months ago
Alternatives and similar repositories for pal:
Users that are interested in pal are comparing it to the libraries listed below
- Aioli: A unified optimization framework for language model data mixing☆25Updated 3 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated 2 months ago
- Recycling diverse models☆44Updated 2 years ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 7 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 6 months ago
- ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"☆80Updated 2 weeks ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆118Updated 6 months ago
- Minimum Description Length probing for neural network representations☆19Updated 3 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 11 months ago
- Official implementation for Sparse MetA-Tuning (SMAT)☆16Updated 10 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆29Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Updated 3 weeks ago
- Google Research☆46Updated 2 years ago
- ☆53Updated 7 months ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆61Updated 7 months ago
- ☆25Updated last year
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆17Updated 10 months ago
- ☆72Updated 2 weeks ago
- Understanding how features learned by neural networks evolve throughout training☆34Updated 6 months ago
- ☆49Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 7 months ago
- Efficiently computing & storing token n-grams from large corpora☆23Updated 7 months ago
- ☆31Updated 4 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 8 months ago
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆51Updated 11 months ago
- ☆37Updated last year
- Exploration of automated dataset selection approaches at large scales.☆40Updated 2 months ago
- The repository contains code for Adaptive Data Optimization☆24Updated 5 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆18Updated this week
- Generating and validating natural-language explanations for the brain.☆52Updated last month