pytorch / torchcodec
PyTorch video decoding
☆47Updated last week
Related projects: ⓘ
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- Make triton easier☆39Updated 3 months ago
- ☆66Updated 3 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆43Updated 3 weeks ago
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- ☆68Updated 2 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated last month
- PyTorch centric eager mode debugger☆43Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆105Updated 3 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆118Updated 2 weeks ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆25Updated 3 weeks ago
- Here we will test various linear attention designs.☆55Updated 4 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆82Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆66Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆34Updated 2 months ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆115Updated last month
- A library for unit scaling in PyTorch☆94Updated 2 weeks ago
- ☆15Updated 6 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- Megatron's multi-modal data loader☆42Updated this week
- ☆190Updated last week
- Implementation of Infini-Transformer in Pytorch☆100Updated last month
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆87Updated 8 months ago
- PyTorch implementation of models from the Zamba2 series.☆63Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆94Updated 2 weeks ago
- ☆42Updated 3 weeks ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆110Updated 5 months ago
- ring-attention experiments☆89Updated 5 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆94Updated this week
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆101Updated last year