fishmingyu / GeoT
GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU
☆22Updated this week
Alternatives and similar repositories for GeoT:
Users that are interested in GeoT are comparing it to the libraries listed below
- ☆21Updated 3 months ago
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆54Updated this week
- PyTorch centric eager mode debugger☆44Updated 2 months ago
- Personal solutions to the Triton Puzzles☆17Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆44Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆17Updated 3 months ago
- Simplified implementation of UMAP like dimensionality reduction algorithm☆44Updated 2 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated last year
- Experiment of using Tangent to autodiff triton☆75Updated last year
- Graph neural networks in JAX.☆67Updated 7 months ago
- Fast and memory-efficient exact attention☆59Updated this week
- Experimental paper writing linter.☆34Updated 5 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆56Updated 3 weeks ago
- Prototype routines for GPU quantization written using PyTorch.☆19Updated this week
- FlashRNN - Fast RNN Kernels with I/O Awareness☆75Updated 2 months ago
- Make triton easier☆43Updated 8 months ago
- Awesome Triton Resources☆19Updated 2 months ago
- Sparsity support for PyTorch☆33Updated this week
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- ☆76Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆103Updated 2 months ago
- extensible collectives library in triton☆82Updated 4 months ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- ML/DL Math and Method notes☆58Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated last month
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated 2 months ago