devvrit / SONew
☆9Updated last year
Alternatives and similar repositories for SONew
Users that are interested in SONew are comparing it to the libraries listed below
Sorting:
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated 2 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆82Updated last year
- ☆34Updated 5 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆28Updated 4 years ago
- Blog post☆17Updated last year
- Parallel Associative Scan for Language Models☆17Updated last year
- ☆31Updated last year
- ☆25Updated 2 years ago
- ☆32Updated 7 months ago
- ☆29Updated 5 months ago
- ☆53Updated 7 months ago
- Minimum Description Length probing for neural network representations☆19Updated 3 months ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- Codes for the paper The emergence of clusters in self-attention dynamics.☆15Updated last year
- Scalable Computation of Hessian Diagonals☆13Updated 11 months ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆50Updated 3 years ago
- Efficient PScan implementation in PyTorch☆16Updated last year
- Efficient Scaling laws and collaborative pretraining.☆16Updated 3 months ago
- code for "Semi-Discrete Normalizing Flows through Differentiable Tessellation"☆26Updated 2 years ago
- ☆18Updated 2 years ago
- Code for our paper "Generative Flow Networks for Discrete Probabilistic Modeling"☆82Updated 2 years ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- ☆17Updated 8 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- ☆49Updated 4 years ago
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆33Updated last year
- ☆31Updated 2 years ago
- Code for our TMLR paper "Distributional GFlowNets with Quantile Flows".☆10Updated last year
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- ☆23Updated 7 months ago