☆18Apr 16, 2025Updated 11 months ago
Alternatives and similar repositories for default-moe
Users that are interested in default-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for ICCV 2023: Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class☆13Oct 16, 2023Updated 2 years ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated last month
- PyCUDA based PyTorch Extension Made Easy☆27Mar 22, 2024Updated 2 years ago
- [ICML 2025] Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts☆31Nov 10, 2025Updated 4 months ago
- Code for our ICCV 2025 paper "CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers."☆50Oct 30, 2025Updated 4 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Nov 28, 2022Updated 3 years ago
- some mixture of experts architecture implementations☆26Mar 22, 2024Updated 2 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"☆12Oct 14, 2025Updated 5 months ago
- ☆11Jul 21, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- ☆29Nov 19, 2025Updated 4 months ago
- ☆64Updated this week
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Korean Text Data Generator for OCR tasks.☆10Aug 20, 2020Updated 5 years ago
- Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]☆17Aug 7, 2024Updated last year
- [ICML2024] "FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees" by Jiaha…☆14Sep 22, 2024Updated last year
- ☆13Jan 15, 2025Updated last year
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated 9 months ago
- Official code for Cumulative Spatial Knowledge Distillation for Vision Transformers (ICCV-2023) https://openaccess.thecvf.com/content/ICC…☆15Nov 5, 2023Updated 2 years ago
- ☆14Dec 21, 2024Updated last year
- Slides and other materials for club meetings☆17Jun 26, 2022Updated 3 years ago
- Official implementation of ICLR 2025 'LORO: Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization'☆16Apr 24, 2025Updated 11 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆29May 24, 2024Updated last year
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- ☆23Jan 5, 2025Updated last year
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- ☆18Jan 4, 2024Updated 2 years ago
- Download ebooks from the Project Gutenberg☆13Dec 30, 2024Updated last year
- An efficient implementation of learned optimizers in PyTorch☆45Dec 2, 2025Updated 3 months ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Nov 1, 2024Updated last year
- ☆15Jul 13, 2025Updated 8 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆19Mar 12, 2026Updated last week
- repository for CharacterChat, a personalized social support system☆76Jul 13, 2024Updated last year
- Official code for ICCV 2023 paper "Convolutional Networks with Oriented 1D Kernels"☆48Jan 30, 2024Updated 2 years ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 6 months ago
- Model Predictive Path Integral Control (MPPI) with PyTorch☆18Jan 26, 2024Updated 2 years ago
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆34Sep 19, 2025Updated 6 months ago
- Explore training for quantized models☆26Jul 12, 2025Updated 8 months ago