apple / ml-actLinks
☆49Updated 7 months ago
Alternatives and similar repositories for ml-act
Users that are interested in ml-act are comparing it to the libraries listed below
Sorting:
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆55Updated 4 months ago
- Sparse Autoencoders for Stable Diffusion XL models.☆67Updated 3 weeks ago
- WIP☆93Updated 11 months ago
- Synthetic Alphabet Dataset☆19Updated 3 months ago
- ☆34Updated 10 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆82Updated 11 months ago
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆31Updated last month
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆82Updated 3 weeks ago
- Focused on fast experimentation and simplicity☆76Updated 6 months ago
- ☆52Updated last year
- DeMo: Decoupled Momentum Optimization☆189Updated 7 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- PyTorch library for Active Fine-Tuning☆87Updated 5 months ago
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆167Updated 3 weeks ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆129Updated last year
- Implementations of attention with the softpick function, naive and FlashAttention-2☆80Updated 2 months ago
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆103Updated this week
- Concept Learning Dynamics☆14Updated 8 months ago
- ☆33Updated 6 months ago
- Latent Diffusion Language Models☆68Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆29Updated 8 months ago
- Official repo for the paper "Weight-based Decomposition: A Case for Bilinear MLPs"☆22Updated 7 months ago
- PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning☆224Updated last week
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆94Updated last month
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆59Updated 3 months ago
- Implementation of a multimodal diffusion transformer in Pytorch☆102Updated last year
- ☆61Updated 8 months ago
- ☆17Updated 7 months ago