simran-arora / focusLinks
This repo contains code for the paper: "Can Foundation Models Help Us Achieve Perfect Secrecy?"
β24Updated 2 years ago
Alternatives and similar repositories for focus
Users that are interested in focus are comparing it to the libraries listed below
Sorting:
- Google Researchβ46Updated 2 years ago
- π° Computing the information content of trained neural networksβ22Updated 4 years ago
- β76Updated last year
- Utilities for Training Very Large Modelsβ58Updated last year
- Code Release for "Broken Neural Scaling Laws" (BNSL) paperβ59Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β38Updated 4 months ago
- Sparse and discrete interpretability tool for neural networksβ63Updated last year
- β26Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)β80Updated 2 years ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE β¦β114Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)β78Updated 2 years ago
- The repository contains code for Adaptive Data Optimizationβ25Updated 10 months ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"β28Updated 3 years ago
- β69Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"β89Updated last year
- β26Updated last year
- Understanding how features learned by neural networks evolve throughout trainingβ39Updated 11 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inferenceβ¦β28Updated last year
- β36Updated 3 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)β31Updated 2 years ago
- β44Updated 10 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptationβ43Updated 11 months ago
- β54Updated 2 years ago
- Code for T-MARS data filteringβ35Updated 2 years ago
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformersβ47Updated 2 years ago
- Data for "Datamodels: Predicting Predictions with Training Data"β97Updated 2 years ago
- β65Updated last year
- My explorations into editing the knowledge and memories of an attention networkβ34Updated 2 years ago
- β10Updated last year
- Official code for the paper: "Metadata Archaeology"β19Updated 2 years ago