Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆18Dec 19, 2024Updated last year
Alternatives and similar repositories for Megatron-DeepSpeed
Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…☆10Aug 13, 2024Updated last year
- [DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation☆12Jul 13, 2023Updated 2 years ago
- ☆12May 8, 2025Updated last year
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Jun 25, 2024Updated last year
- ☆21Jun 6, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Demo on iGPU for FFmpeg decode and scale, OpenVINO inference. this is zero-copy solution, which means No frame data copy from CPU to iGPU…☆17Jan 25, 2023Updated 3 years ago
- ☆28Nov 29, 2024Updated last year
- Machine Reading Comprehension Leadboard Summary☆12Jan 4, 2021Updated 5 years ago
- Modular RDMA Interface☆123Updated this week
- ☆12Mar 22, 2025Updated last year
- ☆15May 8, 2025Updated last year
- ☆20Apr 9, 2019Updated 7 years ago
- ☆46Dec 20, 2023Updated 2 years ago
- Extract Chinese/English QA Data from WikiHow pages.☆16May 21, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆17Jul 18, 2022Updated 3 years ago
- Evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU☆21May 10, 2026Updated last week
- GATSBI: Generative Adversarial Training for Simulation-Based Inference☆19Jul 13, 2023Updated 2 years ago
- Developer kits reference setup scripts for various kinds of Intel platforms and GPUs☆45Updated this week
- C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation☆11Oct 16, 2023Updated 2 years ago
- Kite: Architecture Simulator for RISC-V Instruction Set☆20Mar 22, 2026Updated 2 months ago
- Pointer Networks in PyTorch☆16Nov 7, 2023Updated 2 years ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆15Dec 9, 2024Updated last year
- ☆17Dec 11, 2022Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- The implementation for the work "Graph-Free Knowledge Distillation for Graph Neural Networks".☆19Aug 13, 2021Updated 4 years ago
- 基于BERT和MRC框架实现的嵌套命名实体识别☆19Mar 13, 2022Updated 4 years ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆172Jan 8, 2026Updated 4 months ago
- ☆10Apr 29, 2023Updated 3 years ago
- ☆11Aug 19, 2020Updated 5 years ago
- Graph partitioning for distributed GNN training☆13Mar 26, 2023Updated 3 years ago
- Spline interpolation with FITPACK for xtensor.☆14Apr 10, 2018Updated 8 years ago
- ☆30Updated this week
- Open-source AI acceleration on FPGA: from ONNX to RTL☆54May 14, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆14Aug 3, 2024Updated last year
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆210May 15, 2026Updated last week
- A new DRAM substrate that mitigates the excessive energy consumption from both (i) transmitting unused data on the memory channel and (i…☆14Aug 23, 2024Updated last year
- This is a repository with examples to run inference endpoints on various ALCF clusters☆28Updated this week
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- simple, fast, and slick non-disturbing buffer list☆24Jan 13, 2023Updated 3 years ago
- MAD (Model Automation and Dashboarding)☆36Updated this week