rachtsy / KPCA_code
Implementation for robust ViT and scaled attention
☆18Updated 5 months ago
Alternatives and similar repositories for KPCA_code:
Users that are interested in KPCA_code are comparing it to the libraries listed below
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆63Updated 6 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated last week
- ☆52Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- Efficient Scaling laws and collaborative pretraining.☆15Updated 2 months ago
- JAX Scalify: end-to-end scaled arithmetics☆15Updated 4 months ago
- ☆37Updated 11 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 6 months ago
- ☆91Updated 2 months ago
- ☆74Updated 7 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆24Updated last month
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated 2 weeks ago
- ☆33Updated 6 months ago
- Minimum Description Length probing for neural network representations☆19Updated last month
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆28Updated last month
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated 3 months ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago