KindXiaoming / physics_of_skill_learningLinks
We study toy models of skill learning.
☆28Updated 4 months ago
Alternatives and similar repositories for physics_of_skill_learning
Users that are interested in physics_of_skill_learning are comparing it to the libraries listed below
Sorting:
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆36Updated 3 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 7 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆54Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆40Updated 7 months ago
- Minimum Description Length probing for neural network representations☆19Updated 4 months ago
- ☆79Updated 9 months ago
- ☆47Updated 9 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆17Updated 7 months ago
- Exploration of automated dataset selection approaches at large scales.☆42Updated 3 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆100Updated 5 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆119Updated 7 months ago
- GoldFinch and other hybrid transformer components☆45Updated 10 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 8 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆28Updated 2 months ago
- ☆15Updated 6 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆26Updated last month
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆10Updated 2 months ago
- A repository for research on medium sized language models.☆76Updated last year
- Unofficial Implementation of Selective Attention Transformer☆16Updated 7 months ago
- Here we will test various linear attention designs.☆58Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 months ago
- ☆19Updated 10 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆22Updated 2 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆18Updated 3 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆37Updated last month
- ☆17Updated last month