open-lm-engine/lm-engine

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/open-lm-engine/lm-engine)

open-lm-engine / lm-engine

LM engine is a library for pretraining/finetuning LLMs

☆184

Alternatives and similar repositories for lm-engine

Users that are interested in lm-engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

open-lm-engine / accelerated-model-architectures
View on GitHub
A bunch of kernels that might make stuff slower 😉
☆91Updated this week
mayank31398 / ladder-residual-inference
View on GitHub
☆14Jul 13, 2025Updated last year
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
OliverSieberling / dynamic-conv1d
View on GitHub
Triton kernels for dynamic causal short convolutions.
☆24Jun 4, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆255Jun 21, 2026Updated last month
machine-discovery / deer
View on GitHub
Parallelizing non-linear sequential models over the sequence length
☆57Jun 23, 2025Updated last year
Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆65Jul 7, 2026Updated 2 weeks ago
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆106Apr 7, 2026Updated 3 months ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,070Updated this week
fanshiqing / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆191Apr 8, 2026Updated 3 months ago
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
StarTrail-org / RAG-DS-Serve
View on GitHub
[AAAI26]: DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval
☆53Jan 28, 2026Updated 5 months ago
caikit / caikit-nlp
View on GitHub
☆12Oct 7, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
fla-org / hybrid-distillation
View on GitHub
☆34Dec 31, 2025Updated 6 months ago
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆273May 5, 2026Updated 2 months ago
automl / unlocking_state_tracking
View on GitHub
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆22Mar 15, 2025Updated last year
eth-easl / fmengine
View on GitHub
Utilities for Training Very Large Models
☆58Sep 25, 2024Updated last year
fla-org / flame
View on GitHub
🔥 A minimal training framework for scaling FLA models
☆403Apr 22, 2026Updated 3 months ago
yaof20 / DenseMixer
View on GitHub
Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient
☆68Aug 3, 2025Updated 11 months ago
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆980Jul 4, 2026Updated 2 weeks ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
aHapBean / xHC
View on GitHub
[Tech Report] Expanded Hyper-Connections
☆49Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,409Updated this week
jopetty / word-problem
View on GitHub
Experiments on the impact of depth in transformers and SSMs.
☆44Oct 23, 2025Updated 9 months ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year
IST-DASLab / qutlass
View on GitHub
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆191Updated this week
HazyResearch / Megakernels
View on GitHub
Kernels, of the mega variety :)
☆786May 26, 2026Updated last month
NVlabs / GatedDeltaNet
View on GitHub
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆630Mar 13, 2026Updated 4 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,563Jul 13, 2026Updated last week
allenai / signal-and-noise
View on GitHub
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆31Aug 19, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆535Updated this week
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆888Updated this week
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
chen-hao-chao / mdm-prime-v2
View on GitHub
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models
☆27May 23, 2026Updated 2 months ago
wdlctc / delta-attention-residuals-code
View on GitHub
Delta Attention Residuals - supplementary code and pretrained models
☆40May 20, 2026Updated 2 months ago
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
NVlabs / GatedDeltaNet-2
View on GitHub
Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
☆247May 25, 2026Updated last month