Yuan-ManX / Titans-PyTorch
PyTorch implementation of Titans.
☆23Updated 3 months ago
Alternatives and similar repositories for Titans-PyTorch
Users that are interested in Titans-PyTorch are comparing it to the libraries listed below
Sorting:
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 9 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated this week
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 2 weeks ago
- ☆34Updated 2 weeks ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆18Updated this week
- GoldFinch and other hybrid transformer components☆45Updated 9 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated 7 months ago
- RWKV-7: Surpassing GPT☆85Updated 6 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- Here we will test various linear attention designs.☆60Updated last year
- A large-scale RWKV v6, v7(World, ARWKV, PRWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy o…☆35Updated 2 weeks ago
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆84Updated 2 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆50Updated 3 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- DPO, but faster 🚀☆42Updated 5 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆17Updated last year
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆36Updated 2 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- ☆17Updated 6 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆44Updated 2 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆11Updated 3 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆48Updated 10 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 9 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆33Updated last month
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆56Updated last week
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆58Updated 6 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 3 weeks ago
- ☆32Updated last year
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆34Updated 2 months ago