srush / prof8
Experimental paper writing linter.
☆28Updated 2 weeks ago
Related projects: ⓘ
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- ☆47Updated 3 months ago
- ☆35Updated 5 months ago
- Personal solutions to the Triton Puzzles☆11Updated 2 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 5 months ago
- Utilities for efficient fine-tuning, inference and evaluation of code generation models☆21Updated 11 months ago
- ☆25Updated 5 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆18Updated last year
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆28Updated 4 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆53Updated 5 months ago
- ☆42Updated 3 months ago
- ☆48Updated 4 months ago
- Here we will test various linear attention designs.☆55Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- Using FlexAttention to compute attention with different masking patterns☆28Updated last week
- Experiment of using Tangent to autodiff triton☆66Updated 7 months ago
- ☆45Updated 7 months ago
- ☆42Updated 7 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆94Updated 2 weeks ago
- Source-to-Source Debuggable Derivatives in Pure Python☆14Updated 7 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆31Updated 3 months ago
- ☆22Updated 6 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆15Updated last week
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆14Updated last month
- Blog post☆16Updated 7 months ago
- ☆28Updated last week
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆26Updated last week
- Make triton easier☆39Updated 3 months ago
- Official source code for "Graph Neural Networks for Learning Equivariant Representations of Neural Networks". In ICLR 2024 (oral).☆63Updated last month