rachtsy / KPCA_codeLinks
Implementation for robust ViT and scaled attention
☆20Updated 5 months ago
Alternatives and similar repositories for KPCA_code
Users that are interested in KPCA_code are comparing it to the libraries listed below
Sorting:
- Fork of Flame repo for training of some new stuff in development☆17Updated 3 weeks ago
- ☆82Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 9 months ago
- Code for the paper "Function-Space Learning Rates"☆23Updated 3 months ago
- Minimum Description Length probing for neural network representations☆18Updated 8 months ago
- ☆34Updated last year
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆61Updated 10 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Updated 2 months ago
- ☆85Updated last year
- ☆58Updated 11 months ago
- ☆19Updated 4 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆90Updated last year
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆56Updated 3 months ago
- 📄Small Batch Size Training for Language Models☆62Updated this week
- ☆34Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 10 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- ☆32Updated last year
- Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"☆14Updated 4 months ago
- H-Net Dynamic Hierarchical Architecture☆79Updated 2 weeks ago
- Collection of autoregressive model implementation☆86Updated 5 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise☆37Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆84Updated 2 weeks ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆42Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆20Updated 4 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆146Updated 4 months ago