Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆18Dec 19, 2024Updated last year
Alternatives and similar repositories for Megatron-DeepSpeed
Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…☆10Aug 13, 2024Updated last year
- [DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation☆12Jul 13, 2023Updated 2 years ago
- [ICML2025] LoRA fine-tune directly on the INT4 models.☆40Nov 25, 2024Updated last year
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆63Sep 18, 2025Updated 6 months ago
- Full End-to-End examples showing how to use First-gen Gaudi and Gaudi2 in common use cases☆13Dec 2, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆21Jun 6, 2024Updated last year
- Demo on iGPU for FFmpeg decode and scale, OpenVINO inference. this is zero-copy solution, which means No frame data copy from CPU to iGPU…☆17Jan 25, 2023Updated 3 years ago
- ☆28Nov 29, 2024Updated last year
- ☆11Jun 29, 2021Updated 4 years ago
- Quantization in the Jagged Loss Landscape of Vision Transformers☆13Oct 22, 2023Updated 2 years ago
- ☆14May 8, 2025Updated 11 months ago
- ☆20Apr 9, 2019Updated 7 years ago
- ☆45Dec 20, 2023Updated 2 years ago
- Novel image segmentation datasets collected from endoscopic videos of sinus surgery processes☆13Feb 11, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Nebula: Deep Neural Network Benchmarks in C++☆13Jan 2, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Updated this week
- A branch of marss with DRAMSim hooks☆19Aug 22, 2013Updated 12 years ago
- Evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU☆20Apr 2, 2026Updated last week
- GATSBI: Generative Adversarial Training for Simulation-Based Inference☆19Jul 13, 2023Updated 2 years ago
- C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation☆11Oct 16, 2023Updated 2 years ago
- Professional CUDA C Programming☆31Jul 13, 2020Updated 5 years ago
- Pointer Networks in PyTorch☆17Nov 7, 2023Updated 2 years ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆14Dec 9, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- OpenVINO™ optimization for PointPillars*☆32May 5, 2025Updated 11 months ago
- ☆17Dec 11, 2022Updated 3 years ago
- The implementation for the work "Graph-Free Knowledge Distillation for Graph Neural Networks".☆19Aug 13, 2021Updated 4 years ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆171Jan 8, 2026Updated 3 months ago
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆561Mar 2, 2026Updated last month
- Combining deep neural networks with PCA and k-NN classification for abdominal organ recognition in ultrasound images.☆28Oct 12, 2021Updated 4 years ago
- ☆24Oct 9, 2025Updated 6 months ago
- Graph partitioning for distributed GNN training☆13Mar 26, 2023Updated 3 years ago
- Spline interpolation with FITPACK for xtensor.☆14Apr 10, 2018Updated 8 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping☆22May 8, 2024Updated last year
- ☆22Jun 1, 2025Updated 10 months ago
- ☆33Dec 22, 2025Updated 3 months ago
- Artifact for "DX100: A Programmable Data Access Accelerator for Indirection (ISCA 2025)" paper☆17Nov 6, 2025Updated 5 months ago
- ☆14Aug 3, 2024Updated last year
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆209Apr 3, 2026Updated last week
- Accepted to MLSys 2026☆75Mar 5, 2026Updated last month