Yuan-ManX / Titans-PyTorch
PyTorch implementation of Titans.
☆21Updated 2 months ago
Alternatives and similar repositories for Titans-PyTorch:
Users that are interested in Titans-PyTorch are comparing it to the libraries listed below
- Here we will test various linear attention designs.☆60Updated 11 months ago
- ☆33Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 7 months ago
- A large-scale RWKV v6, v7(World, ARWKV, PRWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy o…☆33Updated last week
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated last month
- ☆32Updated last week
- A repository for research on medium sized language models.☆76Updated 10 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆28Updated 3 weeks ago
- GoldFinch and other hybrid transformer components☆45Updated 8 months ago
- DPO, but faster 🚀☆40Updated 3 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆47Updated last week
- Work in progress.☆50Updated 2 weeks ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 7 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 7 months ago
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 5 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆31Updated 9 months ago
- Lottery Ticket Adaptation☆39Updated 4 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆42Updated 2 weeks ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆21Updated last year
- ☆31Updated last year
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated 4 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆23Updated 2 months ago
- Train, tune, and infer Bamba model☆87Updated 2 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆33Updated last month
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆44Updated 3 weeks ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 9 months ago
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆81Updated last month