☆19Apr 16, 2025Updated last year
Alternatives and similar repositories for default-moe
Users that are interested in default-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for ICCV 2023: Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class☆13Oct 16, 2023Updated 2 years ago
- PyCUDA based PyTorch Extension Made Easy☆27Mar 22, 2024Updated 2 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 2 months ago
- [ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts☆33Nov 10, 2025Updated 5 months ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Nov 28, 2022Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- some mixture of experts architecture implementations☆27Mar 22, 2024Updated 2 years ago
- Code for our ICCV 2025 paper "CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers."☆59Oct 30, 2025Updated 6 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"☆12Oct 14, 2025Updated 6 months ago
- ☆11Jul 21, 2024Updated last year
- ☆33Nov 19, 2025Updated 5 months ago
- ☆65Apr 28, 2026Updated last week
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Korean Text Data Generator for OCR tasks.☆10Aug 20, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]☆17Aug 7, 2024Updated last year
- [ICML2024] "FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees" by Jiaha…☆14Sep 22, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆28Feb 17, 2025Updated last year
- ☆13Apr 27, 2026Updated last week
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated 11 months ago
- Official code for Cumulative Spatial Knowledge Distillation for Vision Transformers (ICCV-2023) https://openaccess.thecvf.com/content/ICC…☆15Nov 5, 2023Updated 2 years ago
- ☆14Dec 21, 2024Updated last year
- Slides and other materials for club meetings☆17Jun 26, 2022Updated 3 years ago
- Official implementation of ICLR 2025 'LORO: Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization'☆17Apr 24, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.