vikhyat / e_nattenLinks
Blazingly fast neighborhood attention
☆12Updated last year
Alternatives and similar repositories for e_natten
Users that are interested in e_natten are comparing it to the libraries listed below
Sorting:
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 10 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Implementation of a holodeck, written in Pytorch☆18Updated last year
- A dashboard for exploring timm learning rate schedulers☆19Updated 6 months ago
- Utilities for PyTorch distributed☆24Updated 3 months ago
- ☆19Updated 3 weeks ago
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 11 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- This is the official repo for Gradient Agreement Filtering (GAF).☆24Updated 4 months ago
- Describe the format of image/text datasets☆11Updated 3 years ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆51Updated 2 months ago
- Load any clip model with a standardized interface☆21Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆27Updated last year
- Latent Large Language Models☆18Updated 9 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 4 months ago
- A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.☆13Updated 3 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated last year
- Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…☆27Updated last year
- CHARacter-awaRE Diffusion: Multilingual Character-Aware Encoders for Font-Aware Diffusers That Can Actually Spell☆14Updated 2 years ago
- Minimum Description Length probing for neural network representations☆19Updated 4 months ago
- My explorations into editing the knowledge and memories of an attention network☆35Updated 2 years ago
- sigma-MoE layer☆18Updated last year
- DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.☆15Updated 3 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Updated 6 months ago
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆60Updated 3 years ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 2 months ago
- Easily run PyTorch on multiple GPUs & machines☆46Updated 2 months ago
- A sample pattern for running CI tests on Modal☆18Updated last month
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆50Updated 3 years ago