KindXiaoming / physics_of_skill_learningLinks

We study toy models of skill learning.

☆31

Alternatives and similar repositories for physics_of_skill_learning

Users that are interested in physics_of_skill_learning are comparing it to the libraries listed below

Sorting:

RobertCsordas / moeut
☆86Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated last year
epfml / DenseFormer
☆81Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 10 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
dmis-lab / Monet
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆73Updated 4 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 10 months ago
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆44Updated 2 weeks ago
Benjamin-Walker / selective-ssms-and-linear-cdes
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆15Updated 9 months ago
katiekang1998 / reasoning_generalization
☆33Updated 9 months ago
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆167Updated 9 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
srush / mamba-primer
☆38Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated this week
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated 2 years ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated last month
shikaiqiu / compute-better-spent
☆60Updated last year
fangyuan-ksgk / selective-attention-transformer
Unofficial Implementation of Selective Attention Transformer
☆17Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
wang-kee / LiNeS
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆31Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆100Updated last year
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆57Updated 5 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
SHI-Labs / CompactNet
☆32Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year