Memory optimized Mixture of Experts
☆75Jul 25, 2025Updated 7 months ago
Alternatives and similar repositories for MoMoE-impl
Users that are interested in MoMoE-impl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Engine for collecting, uploading, and downloading model activations☆27Apr 2, 2025Updated 11 months ago
- Vortex: A Flexible and Efficient Sparse Attention Framework☆49Jan 21, 2026Updated 2 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆116Mar 7, 2026Updated 2 weeks ago
- ☆40Aug 20, 2025Updated 7 months ago
- ☆15Aug 19, 2025Updated 7 months ago
- Render documents on a virtual paper with folds and other types of damage using blender geometry nodes.☆26Aug 14, 2023Updated 2 years ago
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆27Apr 4, 2025Updated 11 months ago
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Jan 21, 2021Updated 5 years ago
- ☆28Feb 28, 2025Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆170Feb 11, 2026Updated last month
- Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025☆31Oct 22, 2025Updated 5 months ago
- NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks☆20May 10, 2022Updated 3 years ago
- Supporting code for the blog post on modular manifolds.☆120Sep 26, 2025Updated 5 months ago
- A better wrapper for using RDMA programming APIs in Rust flavor☆80Mar 15, 2026Updated last week
- 北京大学 2024 秋季学期编译原理课程 Lab 代码、笔记、经验☆17Sep 12, 2025Updated 6 months ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- A lightweight, user-friendly data-plane for LLM training.☆38Sep 10, 2025Updated 6 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆309Dec 6, 2025Updated 3 months ago
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 4 months ago
- ☆31Dec 31, 2025Updated 2 months ago
- ☆36Jan 10, 2026Updated 2 months ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- hints for xv6lab in installing and doing☆12Jan 28, 2021Updated 5 years ago
- Rust standalone inference of Namo-500M series models. Extremly tiny, runing VLM on CPU.☆24Mar 12, 2025Updated last year
- ☆44Sep 8, 2025Updated 6 months ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Some recommendation algorithms and research☆12Sep 16, 2016Updated 9 years ago
- ☆137Mar 20, 2025Updated last year
- A collection of resources for CS 2051, an undergraduate Honors Discrete Mathematics course at Georgia Tech.☆10Jun 24, 2023Updated 2 years ago
- ☆12Aug 26, 2021Updated 4 years ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆278Updated this week
- ☆11Oct 21, 2024Updated last year
- Cerule - A Tiny Mighty Vision Model☆68Nov 9, 2025Updated 4 months ago
- Efficient solutions to Project Euler (https://projecteuler.net/) problems.☆11Feb 12, 2017Updated 9 years ago
- Generic build server☆64May 25, 2014Updated 11 years ago
- ☆22Apr 17, 2025Updated 11 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆55Updated this week
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated last month
- Memento-Skills: Let Agents Design Agents☆110Updated this week