apple / ml-cross-entropyLinks

☆550

Alternatives and similar repositories for ml-cross-entropy

Users that are interested in ml-cross-entropy are comparing it to the libraries listed below

Sorting:

lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆544Updated 6 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆272Updated 2 weeks ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆356Updated 11 months ago
huggingface / kernels
Load compute kernels from the Hub
☆327Updated last week
meta-pytorch / attention-gym
Helpful tools and examples for working with flex-attention
☆1,059Updated this week
thinking-machines-lab / batch_invariant_ops
☆907Updated 2 weeks ago
haoliuhl / ringattention
Large Context Attention
☆752Updated last month
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆390Updated 4 months ago
NVIDIA / kvpress
LLM KV cache compression made easy
☆688Updated last week
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆582Updated 3 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆308Updated last week
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆442Updated 6 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆303Updated 2 weeks ago
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆405Updated last year
NVIDIA-NeMo / Skills
A project to improve skills of large language models
☆611Updated this week
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆346Updated 6 months ago
facebookresearch / spdl
Scalable and Performant Data Loading
☆335Updated this week
huggingface / picotron_tutorial
☆225Updated last month
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆248Updated 5 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 9 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆376Updated 2 months ago
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆501Updated 9 months ago
NVlabs / hymba
☆200Updated 11 months ago
fla-org / native-sparse-attention
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆920Updated 8 months ago
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆910Updated 2 months ago
NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆272Updated this week
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated last year
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆1,036Updated this week
apple / ml-sigmoid-attention
☆301Updated 6 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆212Updated 8 months ago