rachtsy / KPCA_codeLinks

Implementation for robust ViT and scaled attention

☆21

Alternatives and similar repositories for KPCA_code

Users that are interested in KPCA_code are comparing it to the libraries listed below

Sorting:

EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆20Updated 9 months ago
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆19Updated last week
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated 2 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 10 months ago
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated 2 months ago
epfml / DenseFormer
☆82Updated last year
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆57Updated 5 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
GenRobo / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆61Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated 2 weeks ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆22Updated 2 years ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆68Updated last year
shikaiqiu / compute-better-spent
☆61Updated last year
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 10 months ago
RobertCsordas / moeut
☆88Updated last year
watcl-lab / positional_attention
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14Updated 5 months ago
Aleph-Alpha-Research / trigrams
☆58Updated this week
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated last month
EleutherAI / training-jacobian
☆23Updated 11 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆39Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
Ping-C / optimizer
This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…
☆40Updated 2 years ago
samblouir / birdie
☆13Updated 5 months ago
schwartz-lab-NLP / Tokens2Words
☆13Updated 7 months ago
jfpuget / ARC-AGI-Challenge-2024
☆56Updated 11 months ago
lucidrains / sinkhorn-router-pytorch
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
☆39Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated 2 years ago