Lightning-AI / lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
☆1,199Updated this week
Related projects ⓘ
Alternatives and complementary repositories for lightning-thunder
- PyTorch native quantization and sparsity for training and inference☆1,592Updated this week
- A native PyTorch Library for large model training☆2,635Updated this week
- Puzzles for learning Triton☆1,138Updated this week
- Transform datasets at scale. Optimize datasets for fast AI model training.☆368Updated this week
- Schedule-Free Optimization in PyTorch☆1,900Updated 2 weeks ago
- Tile primitives for speedy kernels☆1,661Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- Minimalistic large language model 3D-parallelism training☆1,265Updated this week
- TensorDict is a pytorch dedicated tensor container.☆841Updated this week
- A simple, performant and scalable Jax LLM!☆1,532Updated this week
- A modern model graph visualizer and debugger☆1,058Updated this week
- NanoGPT (124M) in 5 minutes☆1,269Updated this week
- A pytorch quantization backend for optimum☆828Updated last week
- Open weights language model from Google DeepMind, based on Griffin.☆607Updated 4 months ago
- What would you do with 1000 H100s...☆910Updated 10 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆804Updated 3 months ago
- For optimization algorithm research and development.☆451Updated this week
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆717Updated last month
- Training LLMs with QLoRA + FSDP☆1,420Updated 2 weeks ago
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.☆687Updated 3 months ago
- ☆892Updated last month
- GPU programming related news and material links☆1,242Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,048Updated this week
- UNet diffusion model in pure CUDA☆584Updated 4 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆702Updated this week
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,680Updated this week
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,436Updated 3 weeks ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,138Updated last month