☆19Apr 16, 2025Updated last year
Alternatives and similar repositories for default-moe
Users that are interested in default-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for ICCV 2023: Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class☆13Oct 16, 2023Updated 2 years ago
- PyCUDA based PyTorch Extension Made Easy☆27Mar 22, 2024Updated 2 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 4 months ago
- [ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts☆34Nov 10, 2025Updated 7 months ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Nov 28, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- some mixture of experts architecture implementations☆27Mar 22, 2024Updated 2 years ago
- Code for our ICCV 2025 paper "CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers."☆70Oct 30, 2025Updated 8 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆32Nov 12, 2024Updated last year
- Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"☆12Oct 14, 2025Updated 8 months ago
- ☆11Jul 21, 2024Updated last year
- ☆40Nov 19, 2025Updated 7 months ago
- A high-performance acceleration library dedicated to large-scale model training on AMD GPUs☆66Updated this week
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆19Oct 21, 2024Updated last year
- Korean Text Data Generator for OCR tasks.☆10Aug 20, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]☆16Aug 7, 2024Updated last year
- [ICML2024] "FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees" by Jiaha…☆14Sep 22, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆31Feb 17, 2025Updated last year
- ☆13May 4, 2026Updated 2 months ago
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated last year
- Official code for Cumulative Spatial Knowledge Distillation for Vision Transformers (ICCV-2023) https://openaccess.thecvf.com/content/ICC…☆15Nov 5, 2023Updated 2 years ago
- Slides and other materials for club meetings☆17Jun 26, 2022Updated 4 years ago
- Official implementation of ICLR 2025 'LORO: Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization'☆18Apr 24, 2025Updated last year
- ☆29May 24, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- ☆14Dec 21, 2024Updated last year
- ☆23Jan 5, 2025Updated last year
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- ☆18Jan 4, 2024Updated 2 years ago
- Download ebooks from the Project Gutenberg☆14Dec 30, 2024Updated last year
- An efficient implementation of learned optimizers in PyTorch☆58Jun 23, 2026Updated last week
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Nov 1, 2024Updated last year
- ☆14Jul 13, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆19Mar 12, 2026Updated 3 months ago
- repository for CharacterChat, a personalized social support system☆75Jul 13, 2024Updated last year
- Official code for ICCV 2023 paper "Convolutional Networks with Oriented 1D Kernels"☆47Jan 30, 2024Updated 2 years ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 10 months ago
- Model Predictive Path Integral Control (MPPI) with PyTorch☆18Jan 26, 2024Updated 2 years ago
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆42Sep 19, 2025Updated 9 months ago
- Explore training for quantized models☆26Jul 12, 2025Updated 11 months ago