rachtsy / KPCA_codeLinks
Implementation for robust ViT and scaled attention
☆21Updated 8 months ago
Alternatives and similar repositories for KPCA_code
Users that are interested in KPCA_code are comparing it to the libraries listed below
Sorting:
- Fork of Flame repo for training of some new stuff in development☆19Updated 3 weeks ago
- ☆91Updated last year
- ☆82Updated last year
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14Updated 7 months ago
- Minimum Description Length probing for neural network representations☆20Updated 10 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 3 months ago
- H-Net Dynamic Hierarchical Architecture☆80Updated 3 months ago
- ☆59Updated last month
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆149Updated 2 months ago
- ☆28Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated 2 years ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆57Updated 6 months ago
- Repo for solving arc problems with an Neural Cellular Automata☆23Updated 7 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- ☆62Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆21Updated 8 months ago
- gzip Predicts Data-dependent Scaling Laws☆34Updated last year
- Jax like function transformation engine but micro, microjax☆34Updated last year
- Official Code Repository for the paper "Key-value memory in the brain"☆31Updated 10 months ago
- ☆40Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated 5 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆61Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆79Updated last month
- Implementation of GateLoop Transformer in Pytorch and Jax☆91Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆17Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 8 months ago
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆82Updated 7 months ago
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆39Updated last year
- ☆35Updated last year