LIONS-EPFL / scionLinks
☆21Updated 3 weeks ago
Alternatives and similar repositories for scion
Users that are interested in scion are comparing it to the libraries listed below
Sorting:
- ☆53Updated 8 months ago
- ☆12Updated 3 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- ☆9Updated 2 years ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆80Updated 10 months ago
- ☆13Updated 5 months ago
- ☆19Updated last year
- ☆53Updated last year
- ☆28Updated last year
- Efficient PScan implementation in PyTorch☆16Updated last year
- Mixture of A Million Experts☆46Updated 10 months ago
- ☆61Updated 7 months ago
- Code for the paper "Function-Space Learning Rates"☆20Updated 3 weeks ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Updated 8 months ago
- supporting pytorch FSDP for optimizers☆82Updated 6 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆68Updated 10 months ago
- ☆190Updated 6 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆34Updated 9 months ago
- Combining SOAP and MUON☆16Updated 4 months ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆31Updated 7 months ago
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆44Updated 4 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆30Updated 2 years ago
- Blog post☆17Updated last year
- ☆31Updated 7 months ago
- ☆36Updated 2 months ago
- ☆32Updated last year
- ☆32Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 4 years ago
- Minimal but scalable implementation of large language models in JAX☆35Updated 7 months ago