ShadeAlsha / ICon
ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"
☆80Updated 2 weeks ago
Alternatives and similar repositories for ICon
Users that are interested in ICon are comparing it to the libraries listed below
Sorting:
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 7 months ago
- ☆40Updated 3 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆50Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆181Updated 3 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆99Updated 4 months ago
- Getting crystal-like representations with harmonic loss☆183Updated last month
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆163Updated last month
- Implementations of attention with the softpick function, naive and FlashAttention-2☆61Updated 2 weeks ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆139Updated 2 months ago
- ☆177Updated 5 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆124Updated 8 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆78Updated 3 weeks ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆36Updated last month
- ☆71Updated 8 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆63Updated last year
- [ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners☆11Updated last month
- An implementation of PSGD Kron second-order optimizer for PyTorch☆91Updated last month
- Train, tune, and infer Bamba model☆124Updated 2 weeks ago
- Mixture-of-Transformers A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025. 🔗 https//arxiv.org/abs/2411.049…☆46Updated last week
- ☆31Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 8 months ago
- DeMo: Decoupled Momentum Optimization☆186Updated 5 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆67Updated 5 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆110Updated this week
- ☆34Updated 4 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆74Updated last month
- ☆94Updated 3 months ago
- Focused on fast experimentation and simplicity☆72Updated 4 months ago
- ☆78Updated 8 months ago
- ☆81Updated last year