☆18Apr 16, 2025Updated 11 months ago
Alternatives and similar repositories for default-moe
Users that are interested in default-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for ICCV 2023: Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class☆13Oct 16, 2023Updated 2 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated last month
- PyCUDA based PyTorch Extension Made Easy☆27Mar 22, 2024Updated 2 years ago
- [ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts☆34Nov 10, 2025Updated 5 months ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Nov 28, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for our ICCV 2025 paper "CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers."☆55Oct 30, 2025Updated 5 months ago
- some mixture of experts architecture implementations☆27Mar 22, 2024Updated 2 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"☆12Oct 14, 2025Updated 6 months ago
- ☆11Jul 21, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆28Feb 17, 2025Updated last year
- ☆33Nov 19, 2025Updated 4 months ago
- ☆64Apr 8, 2026Updated last week
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Korean Text Data Generator for OCR tasks.☆10Aug 20, 2020Updated 5 years ago
- Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]☆17Aug 7, 2024Updated last year
- [ICML2024] "FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees" by Jiaha…☆14Sep 22, 2024Updated last year
- ☆13Apr 1, 2026Updated 2 weeks ago
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated 10 months ago
- Official code for Cumulative Spatial Knowledge Distillation for Vision Transformers (ICCV-2023) https://openaccess.thecvf.com/content/ICC…☆15Nov 5, 2023Updated 2 years ago
- ☆14Dec 21, 2024Updated last year
- Slides and other materials for club meetings☆17Jun 26, 2022Updated 3 years ago
- Official implementation of ICLR 2025 'LORO: Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization'☆16Apr 24, 2025Updated 11 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆29May 24, 2024Updated last year
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- ☆23Jan 5, 2025Updated last year
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- ☆18Jan 4, 2024Updated 2 years ago
- Download ebooks from the Project Gutenberg☆13Dec 30, 2024Updated last year
- An efficient implementation of learned optimizers in PyTorch☆46Apr 5, 2026Updated last week
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Nov 1, 2024Updated last year
- ☆14Jul 13, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆19Mar 12, 2026Updated last month
- repository for CharacterChat, a personalized social support system☆75Jul 13, 2024Updated last year
- Official code for ICCV 2023 paper "Convolutional Networks with Oriented 1D Kernels"☆48Jan 30, 2024Updated 2 years ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 7 months ago
- Model Predictive Path Integral Control (MPPI) with PyTorch☆18Jan 26, 2024Updated 2 years ago
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆35Sep 19, 2025Updated 6 months ago
- Explore training for quantized models☆26Jul 12, 2025Updated 9 months ago