samsja / muon_fsdp_2View external linksLinks
Muon fsdp 2
☆53Aug 8, 2025Updated 6 months ago
Alternatives and similar repositories for muon_fsdp_2
Users that are interested in muon_fsdp_2 are comparing it to the libraries listed below
Sorting:
- Official pytorch code for "APP: Anytime Progressive Pruning" (DyNN @ ICML, 2022; CLL @ ACML, 2022, SNN @ ICML, 2022 and SlowDNN 2023)☆16Nov 22, 2022Updated 3 years ago
- ☆13Dec 12, 2025Updated 2 months ago
- Stick-breaking attention☆62Jul 1, 2025Updated 7 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 5 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 3 months ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆19Jun 11, 2025Updated 8 months ago
- ☆18Oct 15, 2020Updated 5 years ago
- ☆20Oct 10, 2025Updated 4 months ago
- Spectral Sphere Optimizer☆96Jan 14, 2026Updated last month
- Fast Inference in Denoising Diffusion Models via MMD Finetuning☆18Dec 4, 2023Updated 2 years ago
- Research work aimed at addressing the problem of modeling infinite-length context☆46Dec 18, 2025Updated last month
- To deploy Transformer models in CV to mobile devices.☆18Jan 20, 2022Updated 4 years ago
- Manually implemented quantization-aware training☆23Oct 12, 2022Updated 3 years ago
- Solidity contracts for the decentralized Prime Network protocol☆26Jul 6, 2025Updated 7 months ago
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated 8 months ago
- 🔥 A minimal training framework for scaling FLA models☆344Nov 15, 2025Updated 3 months ago
- Fast sparse deep learning on CPUs☆56Sep 28, 2022Updated 3 years ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last week
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- DeMo: Decoupled Momentum Optimization☆198Dec 2, 2024Updated last year
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆33Sep 28, 2025Updated 4 months ago
- ☆64Apr 9, 2024Updated last year
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Oct 11, 2023Updated 2 years ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Jul 17, 2023Updated 2 years ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 6 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆38Dec 10, 2015Updated 10 years ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆128Oct 9, 2025Updated 4 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆151Sep 19, 2025Updated 4 months ago
- ☆11Apr 17, 2021Updated 4 years ago
- ☆54Dec 17, 2025Updated last month
- ✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork☆312Sep 6, 2025Updated 5 months ago
- Little article showing how to load pytorch's models with linear memory consumption☆34Aug 29, 2022Updated 3 years ago
- An artificial matrix generator in C☆12Feb 16, 2023Updated 3 years ago
- Code for the paper "Faster Neural Network Training with Approximate Tensor Operations"☆10Oct 23, 2021Updated 4 years ago
- MATLAB function to fill an area with hatching ~~or speckling~~☆11Mar 4, 2018Updated 7 years ago
- Debiasing Through Data Attribution☆12May 23, 2024Updated last year
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- An Efficent BPE Algorithm Faster then Hugging Face Tokenizer's Implementation☆13Sep 9, 2024Updated last year