Oxen-AI / Score-Entropy-Discrete-DiffusionLinks
Modified Score-Entropy-Discrete-Diffusion to do a character level ml model and integrate with Oxen
☆14Updated last year
Alternatives and similar repositories for Score-Entropy-Discrete-Diffusion
Users that are interested in Score-Entropy-Discrete-Diffusion are comparing it to the libraries listed below
Sorting:
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆35Updated 3 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆36Updated 3 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 8 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30Updated 3 years ago
- ☆32Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 7 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆19Updated 10 months ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆58Updated last year
- ☆15Updated 6 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆33Updated 2 months ago
- A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…☆35Updated last week
- research impl of Native Sparse Attention (2502.11089)☆54Updated 3 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆51Updated 2 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆56Updated last year
- Accompanying repository for the paper "DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions"☆25Updated 3 weeks ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆50Updated 3 years ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆18Updated 7 months ago
- Implementation of "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs" in PyTorch☆18Updated 2 weeks ago
- Continual Resilient (CoRe) Optimizer for PyTorch☆10Updated 11 months ago
- ☆27Updated last year
- My explorations into editing the knowledge and memories of an attention network☆35Updated 2 years ago
- ☆21Updated 7 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 4 months ago
- sigma-MoE layer☆18Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆50Updated 3 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆48Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 7 months ago
- ☆38Updated last year