Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆18Dec 19, 2024Updated last year
Alternatives and similar repositories for Megatron-DeepSpeed
Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆14Jan 8, 2026Updated 2 months ago
- ☆11May 8, 2025Updated 10 months ago
- [ICML2025] LoRA fine-tune directly on the quantized models.☆39Nov 25, 2024Updated last year
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆64Sep 18, 2025Updated 6 months ago
- ☆28Nov 29, 2024Updated last year
- ☆11Jun 29, 2021Updated 4 years ago
- Modular RDMA Interface☆94Updated this week
- Quantization in the Jagged Loss Landscape of Vision Transformers☆13Oct 22, 2023Updated 2 years ago
- ☆13May 8, 2025Updated 10 months ago
- ☆45Dec 20, 2023Updated 2 years ago
- Nebula: Deep Neural Network Benchmarks in C++☆13Jan 2, 2025Updated last year
- Novel image segmentation datasets collected from endoscopic videos of sinus surgery processes☆13Feb 11, 2023Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated this week
- [HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System☆53Jul 21, 2025Updated 8 months ago
- GATSBI: Generative Adversarial Training for Simulation-Based Inference☆19Jul 13, 2023Updated 2 years ago
- ☆22Oct 27, 2024Updated last year
- Kite: Architecture Simulator for RISC-V Instruction Set☆20Updated this week
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆14Dec 9, 2024Updated last year
- OpenVINO™ optimization for PointPillars*☆30May 5, 2025Updated 10 months ago
- LaTeX 양식 : R&E, 졸업논문, beamer 등등 - 컴파일된 결과 pdf파일 미포함☆63Mar 11, 2025Updated last year
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆170Jan 8, 2026Updated 2 months ago
- A fork of the Linux kernel for p2pmem enabled devices like NVMe devices with CMBs, Microsemi NVRAM card (and other devices that can expos…☆29Mar 2, 2026Updated 3 weeks ago
- ☆10Apr 29, 2023Updated 2 years ago
- ☆24Oct 9, 2025Updated 5 months ago
- ☆11Aug 19, 2020Updated 5 years ago
- PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping☆22May 8, 2024Updated last year
- Spline interpolation with FITPACK for xtensor.☆14Apr 10, 2018Updated 7 years ago
- ☆20Jun 1, 2025Updated 9 months ago
- ☆10Updated this week
- ☆32Dec 22, 2025Updated 3 months ago
- Artifact for "DX100: A Programmable Data Access Accelerator for Indirection (ISCA 2025)" paper☆17Nov 6, 2025Updated 4 months ago
- Open-source AI acceleration on FPGA: from ONNX to RTL☆49Updated this week
- ☆14Aug 3, 2024Updated last year
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆207Mar 16, 2026Updated last week
- ☆14Sep 21, 2020Updated 5 years ago
- Accepted to MLSys 2026☆73Mar 5, 2026Updated 2 weeks ago
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- simple, fast, and slick non-disturbing buffer list☆24Jan 13, 2023Updated 3 years ago
- Ongoing research training transformer models at scale☆39Updated this week