fattorib / Flax-ResNetsLinks
CIFAR10 ResNets implemented in JAX+Flax
☆12Updated 3 years ago
Alternatives and similar repositories for Flax-ResNets
Users that are interested in Flax-ResNets are comparing it to the libraries listed below
Sorting:
- ☆51Updated last year
- Recycling diverse models☆45Updated 2 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆31Updated 2 years ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- ☆21Updated 2 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆37Updated last year
- ☆23Updated 2 years ago
- Explores the ideas presented in Deep Ensembles: A Loss Landscape Perspective (https://arxiv.org/abs/1912.02757) by Stanislav Fort, Huiyi …☆65Updated 4 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆37Updated 2 years ago
- ☆74Updated 2 years ago
- Blog post☆17Updated last year
- This repository holds code and other relevant files for the NeurIPS 2022 tutorial: Foundational Robustness of Foundation Models.☆71Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- ☆29Updated 2 years ago
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆62Updated 4 years ago
- ☆18Updated 2 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last month
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆75Updated last year
- ☆20Updated last year
- Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]☆36Updated 10 months ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆59Updated 3 years ago
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆31Updated 2 years ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆65Updated 10 months ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- ☆10Updated 2 years ago
- ☆16Updated 2 years ago
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆34Updated last year
- ☆55Updated 11 months ago