wiegerw / nerva
C++ and Python libraries for neural networks.
☆14Updated 6 months ago
Alternatives and similar repositories for nerva:
Users that are interested in nerva are comparing it to the libraries listed below
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last month
- ☆49Updated last year
- ☆52Updated 6 months ago
- Prototype routines for GPU quantization written using PyTorch.☆21Updated 2 months ago
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Updated 8 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- seqax = sequence modeling + JAX☆154Updated 2 weeks ago
- Make triton easier☆47Updated 10 months ago
- ☆50Updated 5 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last week
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆16Updated 6 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆41Updated this week
- ☆43Updated last year
- ☆13Updated last month
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated 2 months ago
- Hacks for PyTorch☆19Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- JAX/Flax implementation of the Hyena Hierarchy☆34Updated 2 years ago
- A MAD laboratory to improve AI architecture designs 🧪☆111Updated 4 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated this week
- ☆22Updated 2 years ago
- ☆32Updated 6 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆80Updated 3 years ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 6 months ago
- The Energy Transformer block, in JAX☆57Updated last year
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆33Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 9 months ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆92Updated 4 months ago
- Evaluation of neuro-symbolic engines☆35Updated 8 months ago